Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2020

Abstract

Understanding fine-grained activities, such as sport highlights, is a problem being overlooked and receives considerably less research attention. Potential reasons include absences of specific fine-grained action benchmark datasets, research preferences to general supercategorical activities classification, and challenges of large visual similarities between fine-grained actions. To tackle these, we collect and manually annotate two sport highlights datasets, i.e., Basketball8 & Soccer-10, for fine-grained action classification. Sample clips in the datasets are annotated with professional sub-categorical actions like “dunk”, “goalkeeping” and etc. We also propose a Compact Bilinear Augmented Query Structured Attention (CBA-QSA) module and stack it on top of general three-dimensional neural networks in a plug-and-play manner to emphasize important spatio-temporal clues in highlight clips. Specifically, we adapt the hierarchical attention neural networks, which contain learnable query-scheme, on the video to identify discriminative spatial/temporal visual clues within highlight clips. We name this altered attention which separately learns a query for spatial/temporal feature as query structured attention (QSA). Furthermore, we inflate bilinear mapping, which is a mature technique to represent local pairwise interactions for image-level fine-grained classification, on video understanding. In detail, we extend its compact version (i.e., compact bilinear mapping (CBM) based on TensorSketch) to deal with the three-dimensional video signal for modeling local pairwise motion information. We eventually incorporate CBM and QSA together to form CBA-QSA neural networks for fine-grained sport highlights classifications. Experimental results demonstrate that CBA-QSA improves the general state-of-the-arts on Basketball-8 and Soccer-10 datasets.

Keywords

compact bilinear mapping, fine-grained video classification, spatio-temporal attention, sport highlights recognition

Discipline

Computer Sciences | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16

First Page

628

Last Page

636

ISBN

9781450379885

Identifier

10.1145/3394171.3413595

Publisher

Association for Computing Machinery, Inc

City or Country

Virtual Conference

Share

COinS