Research Collection School Of Computing and Information Systems

Group contextualization for video recognition

Yanbin HAO, University of Science and Technology of China
Hao ZHANG, Singapore Management UniversityFollow
Chong-wah NGO, Singapore Management UniversityFollow
Xiangnan HE, University of Science and Technology of China

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2022

Abstract

Learning discriminative representation from the complex spatio-temporal dynamic space is essential for video recognition. On top of those stylized spatio-temporal computational units, further refining the learnt feature with axial contexts is demonstrated to be promising in achieving this goal. However, previous works generally focus on utilizing a single kind of contexts to calibrate entire feature channels and could hardly apply to deal with diverse video activities. The problem can be tackled by using pair-wise spatio-temporal attentions to recompute feature response with cross-axis contexts at the expense of heavy computations. In this paper, we propose an efficient feature refinement method that decomposes the feature channels into several groups and separately refines them with different axial contexts in parallel. We refer this lightweight feature calibration as group contextualization (GC). Specifically, we design a family of efficient element-wise calibrators, i.e., ECal-G/S/T/L, where their axial contexts are information dynamics aggregated from other axes either globally or locally, to contextualize feature channel groups. The GC module can be densely plugged into each residual layer of the off-the-shelf video networks. With little computational overhead, consistent improvement is observed when plugging in GC on different networks. By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities. On videos with rich temporal variations, empirically GC can boost the performance of 2D-CNN (e.g., TSN and TSM) to a level comparable to the state-of-the-art video networks. Code is available at https://github.com/haoyanbin918/GroupContextualization.

Keywords

Recognition, detection, categorization, retrieval, Action and event recognition, Deep learning architectures and techniques, Efficient learning and inferences

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, June 18-24: Proceedings

First Page

918

Last Page

928

ISBN

9781665469463

Identifier

10.1109/CVPR52688.2022.00100

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

HAO, Yanbin; ZHANG, Hao; NGO, Chong-wah; and HE, Xiangnan. Group contextualization for video recognition. (2022). 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, June 18-24: Proceedings. 918-928.
Available at: https://ink.library.smu.edu.sg/sis_research/7504

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/CVPR52688.2022.00100

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Group contextualization for video recognition

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Group contextualization for video recognition

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links