Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

9-2013

Abstract

Spatio-temporal interest point (STIP) based features show great promises in human action analysis with high efficiency and robustness. However, they typically focus on bag-of-visual words (BoVW), which omits any correlation among words and shows limited discrimination in real-world videos. In this paper, we propose a novel approach to add the spatio-temporal co-occurrence relationships of visual words to BoVW for a richer representation. Rather than assigning a particular scale on videos, we adopt the normalized google-like distance (NGLD) to measure the words' co-occurrence semantics, which grasps the videos' structure information in a statistical way. All pairwise distances in spatial and temporal domain compose the corresponding NGLD correlograms, then their united form is incorporated with BoVW by training a multi-channel kernel SVM classifier. Experiments on real-world datasets (KTH and UCF sports) validate the efficiency of our approach for the classification of human actions.

Keywords

Human action recognition, spatio-temporal interest points, bag-of-words, co-occurrence

Discipline

Computer Engineering | Software Engineering

Research Areas

Data Science and Engineering

Publication

2013 IEEE International Conference on Image Processing (ICIP 2013), Melbourne, September 15-18

First Page

1

Last Page

5

Identifier

10.1109/ICIP.2013.6738663

Publisher

IEEE

City or Country

Melbourne

Additional URL

https://doi.org/10.1109/ICIP.2013.6738663

Share

COinS