Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
10-2009
Abstract
This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynamics. This often results in large set of fragmentized cuboids which fail to capture long-term dynamic properties of realistic actions. This paper proposes the detection of spatio-temporal motion volumes (namely Volume of Interest, VOI) of scale and position adaptive to localize actions. First, motions are described as bags of point trajectories by tracking keypoints along the time dimension. VOIs are then adaptively extracted by clustering trajectory on the motion mainfold. The resulting VOIs, of varying scales and centering at arbitrary positions depending on motion dynamics, are eventually described by SIFT and 3D gradient features for action recognition. Comparing with fixed-size cuboids, VOI allows comprehensive modeling of long-term motion and shows better capability in capturing contextual information associated with motion dynamics. Experiments on a realistic Hollywood movie dataset show that the proposed approach can achieve 20% relative improvement compared to the state-ofthe-art STIP based algorithm.
Keywords
Human action recognition, Keypoint trajectory, Mean-shift clustering, Motion subspace learning, Realistic videos
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 17th ACM International Conference on Multimedia, MM'09, Beijing China, 2009 October 19-24
First Page
505
Last Page
508
ISBN
9781605586083
Identifier
10.1145/1631272.1631342
Publisher
ACM
City or Country
Beijing, China
Citation
1
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons