Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2020
Abstract
Understanding fine-grained activities, such as sport highlights, is a problem being overlooked and receives considerably less research attention. Potential reasons include absences of specific fine-grained action benchmark datasets, research preferences to general supercategorical activities classification, and challenges of large visual similarities between fine-grained actions. To tackle these, we collect and manually annotate two sport highlights datasets, i.e., Basketball8 & Soccer-10, for fine-grained action classification. Sample clips in the datasets are annotated with professional sub-categorical actions like “dunk”, “goalkeeping” and etc. We also propose a Compact Bilinear Augmented Query Structured Attention (CBA-QSA) module and stack it on top of general three-dimensional neural networks in a plug-and-play manner to emphasize important spatio-temporal clues in highlight clips. Specifically, we adapt the hierarchical attention neural networks, which contain learnable query-scheme, on the video to identify discriminative spatial/temporal visual clues within highlight clips. We name this altered attention which separately learns a query for spatial/temporal feature as query structured attention (QSA). Furthermore, we inflate bilinear mapping, which is a mature technique to represent local pairwise interactions for image-level fine-grained classification, on video understanding. In detail, we extend its compact version (i.e., compact bilinear mapping (CBM) based on TensorSketch) to deal with the three-dimensional video signal for modeling local pairwise motion information. We eventually incorporate CBM and QSA together to form CBA-QSA neural networks for fine-grained sport highlights classifications. Experimental results demonstrate that CBA-QSA improves the general state-of-the-arts on Basketball-8 and Soccer-10 datasets.
Keywords
compact bilinear mapping, fine-grained video classification, spatio-temporal attention, sport highlights recognition
Discipline
Computer Sciences | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16
First Page
628
Last Page
636
ISBN
9781450379885
Identifier
10.1145/3394171.3413595
Publisher
Association for Computing Machinery, Inc
City or Country
Virtual Conference
Citation
HAO, Yanbin; ZHANG, Hao; NGO, Chong-wah; LIU, Qing; and HU, Xiaojun.
Compact bilinear augmented query structured attention for sport highlights classification. (2020). Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16. 628-636.
Available at: https://ink.library.smu.edu.sg/sis_research/6483
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.