Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

7-2025

Abstract

We introduce the task of Audible Action Temporal Localization, which aims to identify the spatiotemporal coordinates of audible movements. Unlike conventional tasks such as action recognition and temporal action localization, which broadly analyze video content, our task focuses on the distinct kinematic dynamics of audible actions. It is based on the premise that key actions are driven by inflectional movements; for example, collisions that produce sound often involve abrupt changes in motion. To capture this, we propose T A2Net, a novel architecture that estimates inflectional flow using the second derivative of motion to determine collision timings without relying on audio input. T A2Net also integrates a self-supervised spatial localization strategy during training, combining contrastive learning with spatial analysis. This dual design improves temporal localization accuracy and simultaneously identifies sound sources within video frames. To support this task, we introduce a new benchmark dataset, Audible623, derived from Kinetics and UCF101 by removing non-essential vocalization subsets. Extensive experiments confirm the effectiveness of our approach on Audible623 and show strong generalizability to other domains, such as repetitive counting and sound source localization. Code and dataset are available at https:// github.com/WenlongWan/Audible623.

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, Canada, 2025 July 13-19

First Page

Last Page

City or Country

USA

Citation

WAN, Wenlong; ZHENG, Weiying; XIANG, Tianyi; LI, Guiqing; and HE, Shengfeng. Action dubber: Timing audible actions via inflectional flow. (2025). Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, Canada, 2025 July 13-19. 1-17.
Available at: https://ink.library.smu.edu.sg/sis_research/10476

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Action dubber: Timing audible actions via inflectional flow

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Action dubber: Timing audible actions via inflectional flow

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links