Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
9-2013
Abstract
Spatio-temporal interest point (STIP) based features show great promises in human action analysis with high efficiency and robustness. However, they typically focus on bag-of-visual words (BoVW), which omits any correlation among words and shows limited discrimination in real-world videos. In this paper, we propose a novel approach to add the spatio-temporal co-occurrence relationships of visual words to BoVW for a richer representation. Rather than assigning a particular scale on videos, we adopt the normalized google-like distance (NGLD) to measure the words' co-occurrence semantics, which grasps the videos' structure information in a statistical way. All pairwise distances in spatial and temporal domain compose the corresponding NGLD correlograms, then their united form is incorporated with BoVW by training a multi-channel kernel SVM classifier. Experiments on real-world datasets (KTH and UCF sports) validate the efficiency of our approach for the classification of human actions.
Keywords
Human action recognition, spatio-temporal interest points, bag-of-words, co-occurrence
Discipline
Computer Engineering | Software Engineering
Research Areas
Data Science and Engineering
Publication
2013 IEEE International Conference on Image Processing (ICIP 2013), Melbourne, September 15-18
First Page
1
Last Page
5
Identifier
10.1109/ICIP.2013.6738663
Publisher
IEEE
City or Country
Melbourne
Citation
SUN, Qianru and LIU, Hong.
Learning spatio-temporal co-occurrence correlograms for efficient human action classification. (2013). 2013 IEEE International Conference on Image Processing (ICIP 2013), Melbourne, September 15-18. 1-5.
Available at: https://ink.library.smu.edu.sg/sis_research/4465
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ICIP.2013.6738663