Research Collection School Of Computing and Information Systems

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Yu ZHAO, Tianjin University
Hao FEI, National University of Singapore
Yixin CAO, Singapore Management UniversityFollow
Bobo LI, Wuhan University
Meishan ZHANG
Jianguo WEI
Min ZHANG
Tat-Seng CHUA

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

11-2023

Abstract

As one of the core video semantic understanding tasks, Video Semantic Role Labeling (VidSRL) aims to detect the salient events from given videos, by recognizing the predict-argument event structures and the interrelationships between events. While recent endeavors have put forth methods for VidSRL, they can be mostly subject to two key drawbacks, including the lack of fine-grained spatial scene perception and the insufficiently modeling of video temporality. Towards this end, this work explores a novel holistic spatio-temporal scene graph (namely HostSG) representation based on the existing dynamic scene graph structures, which well model both the fine-grained spatial semantics and temporal dynamics of videos for VidSRL. Built upon the HostSG, we present a nichetargeting VidSRL framework. A scene-event mapping mechanism is first designed to bridge the gap between the underlying scene structure and the high-level event semantic structure, resulting in an overall hierarchical scene-event (termed ICE) graph structure. We further perform iterative structure refinement to optimize the ICE graph, e.g., filtering noisy branches and newly building informative connections, such that the overall structure representation can best coincide with end task demand. Finally, three subtask predictions of VidSRL are jointly decoded, where the end-to-end paradigm effectively avoids error propagation. On the benchmark dataset, our framework boosts significantly over the current best-performing model. Further analyses are shown for a better understanding of the advances of our methods. Our HostSG representation shows greater potential to facilitate a broader range of other video understanding tasks.

Keywords

video understanding, semantics role labeling, event extraction, scene graph

Discipline

Graphics and Human Computer Interfaces | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, October 29 - November 3

First Page

5281

Last Page

5291

ISBN

9798400701085

Identifier

10.1145/3581783.3612096

Publisher

ACM

City or Country

New York

Citation

ZHAO, Yu; FEI, Hao; CAO, Yixin; LI, Bobo; ZHANG, Meishan; WEI, Jianguo; ZHANG, Min; and CHUA, Tat-Seng. Constructing holistic spatio-temporal scene graph for video semantic role labeling. (2023). MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, October 29 - November 3. 5281-5291.
Available at: https://ink.library.smu.edu.sg/sis_research/8290

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3581783.3612096

Download

Find it in your library

Included in

Graphics and Human Computer Interfaces Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links