Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

11-2012

Abstract

Classifying realistic human actions in video remains challenging for existing intro-variability and inter-ambiguity in action classes. Recently, Spatial-Temporal Interest Point (STIP) based local features have shown great promise in complex action analysis. However, these methods have the limitation that they typically focus on Bag-of-Words (BoW) algorithm, which can hardly discriminate actions’ ambiguity due to ignoring of spatial-temporal occurrence relations of visual words. In this paper, we propose a new model to capture this contextual relationship in terms of pairwise features’ co-occurrence. Normalized Google-Like Distance (NGLD) is proposed to numerically measuring this co-occurrence, due to its effectiveness in semantic correlation analysis. All pairwise distances compose a NGLD correlogram and its normalized form is incorporated into the final action representation. It is proved a much richer descriptor by observably reducing action ambiguity in experiments, conducted on WEIZMANN dataset and the more challenging UCF sports. Results also demonstrate the proposed model is more effective and robust than BoW on different setups.

Keywords

Human action recognition, Spatial-Temporal Interest Point, Normalized Google-Like Distance

Discipline

Computer Engineering | Software Engineering

Research Areas

Data Science and Engineering

Publication

Proceedings of the 11th Asian Conference on Computer Vision (ACCV 2012), Daejeon, Korea, November 5-9

First Page

1

Last Page

12

Identifier

10.1007/978-3-642-37431-9_33

City or Country

Seoul

Additional URL

https://doi.org/10.1007/978-3-642-37431-9_33

Share

COinS