Publication Type

Journal Article

Version

acceptedVersion

Publication Date

7-2021

Abstract

This paper conducts a systematic study on the role of visual attention in video object pattern understanding. By elaborately annotating three popular video segmentation datasets (DAVIS) with dynamic eye-tracking data in the unsupervised video object segmentation (UVOS) setting. For the first time, we quantitatively verified the high consistency of visual attention behavior among human observers, and found strong correlation between human attention and explicit primary object judgments during dynamic, task-driven viewing. Such novel observations provide an in-depth insight of the underlying rationale behind video object pattens. Inspired by these findings, we decouple UVOS into two sub-tasks: UVOS-driven Dynamic Visual Attention Prediction (DVAP) in spatiotemporal domain, and Attention-Guided Object Segmentation (AGOS) in spatial domain. Our UVOS solution enjoys three major advantages: 1) modular training without using expensive video segmentation annotations, instead, using more affordable dynamic fixation data to train the initial video attention module and using existing fixation-segmentation paired static/image data to train the subsequent segmentation module; 2) comprehensive foreground understanding through multi-source learning; and 3) additional interpretability from the biologically-inspired and assessable attention. Experiments on four popular benchmarks show that, even without using expensive video object mask annotations, our model achieves compelling performance compared with state-of-the-arts and enjoys fast processing speed (10 fps on a single GPU). Our collected eye-tracking data and algorithm implementations have been made publicly available athttps://github.com/wenguanwang/AGS.

Keywords

Video object pattern understanding, unsupervised video object segmentation, top-down visual attention, video salient object detection

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volume

Issue

First Page

2413

Last Page

2428

ISSN

0162-8828

Identifier

10.1109/TPAMI.2020.2966453

Publisher

Institute of Electrical and Electronics Engineers

Citation

WANG, Wenguan; SHEN, Jianbing; LU, Xiankai; HOI, Steven C. H.; and LING, Haibin. Paying attention to video object pattern understanding. (2021). IEEE Transactions on Pattern Analysis and Machine Intelligence. 43, (7), 2413-2428.
Available at: https://ink.library.smu.edu.sg/sis_research/6960

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TPAMI.2020.2966453

Download

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Paying attention to video object pattern understanding

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Paying attention to video object pattern understanding

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links