Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2014

Abstract

Human action recognition is challenging mainly due to intro-variety, inter-ambiguity and clutter backgrounds in real videos. Bag-of-visual words model utilizes spatio-temporal interest points(STIPs), and represents action by the distribution of points which ignores visual context among points. To add more contextual information, we propose a method by encoding spatio-temporal distribution of weighted pairwise points. First, STIPs are extracted from an action sequence and clustered into visual words. Then, each word is weighted in both temporal and spatial domains to capture the relationships with other words. Finally, the directional relationships between co-occurrence pairwise words are used to encode visual contexts. We report state-of-the-art results on Rochester and UT-Interaction datasets to validate that our method can classify human actions with high accuracies.

Keywords

Human action recognition, spatio-temporal interest points, bag-of-words

Discipline

Computer Engineering | Software Engineering

Research Areas

Data Science and Engineering

Publication

2014 IEEE International Conference on Image Processing (ICIP 2014), Paris, October 27-30

First Page

1460

Last Page

1464

Identifier

10.1109/ICIP.2014.7025292

Publisher

IEEE

City or Country

Paris

Additional URL

https://doi.org/10.1109/ICIP.2014.7025292

Share

COinS