Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2021

Abstract

This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object information. This is a many-to-many problem and often relies on full-rank attention, which is computationally intensive. In this paper, we propose a novel Domain Agent Network (DAN), breaking down the full-rank attention into two smaller ones. We consider one single frame of the query video as the domain agent, bridging between the support images and the query video. Our DAN allows a linear space and time complexity as opposed to the original quadratic form with no loss of performance. In addition, we introduce a learning strategy by combining meta-learning with online learning to further improve the segmentation accuracy. We build a FSVOS benchmark on the Youtube-VIS dataset and conduct experiments to demonstrate that our method outperforms baselines on both computational cost and accuracy, achieving the state-of-the-art performance

Keywords

Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations

Discipline

Graphics and Human Computer Interfaces | Numerical Analysis and Scientific Computing

Research Areas

Information Systems and Management

Publication

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Virtual, June 19-25

Last Page

14035

ISBN

9781665445092

Identifier

10.1109/CVPR46437.2021.01382

Publisher

IEEE

City or Country

New Jersey, USA

Citation

CHEN, Haoxin; WU, Hanjie; ZHAO, Nanxuan; REN, Sucheng; and HE, Shengfeng. Delving deep into many-to-many attention for few-shot video object segmentation. (2021). Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Virtual, June 19-25. 14035.
Available at: https://ink.library.smu.edu.sg/sis_research/8527

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Graphics and Human Computer Interfaces Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Delving deep into many-to-many attention for few-shot video object segmentation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Delving deep into many-to-many attention for few-shot video object segmentation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links