Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2013

Abstract

Automatically inferring ongoing activities is to enable the early recognition of unfinished activities, which is quite meaningful for applications, such as online human-machine interaction and security monitoring. State-of-the-art methods use the spatiotemporal interest point (STIP) based features as the low-level video description to handle complex scenes. While the existing problem is that typical bag-of-visual words (BoVW) focuses on the statistical distribution of features but ignores the inherent contexts in activity sequences, resulting in low discrimination when directly dealing with limited observations. To solve this problem, the Recurrent Self-Organizing Map (RSOM), which was designed to process sequential data, is novelly adopted in this paper for the high-level representation of ongoing human activities. The innovation lies that the currently observed features and their spatio-temporal contexts are encoded in a trajectory of the pre-trained RSOM units. Additionally, a combination of Dynamic Time Warping (DTW) distance and Edit distance, named DTW-E, is specially proposed to measure the structural dissimilarity between RSOM trajectories. Two real-world datasets with markedly different characteristics, complex scenes and inter-class ambiguities, serve as sources of data for evaluation. Experimental results based on kNN classifiers confirm that our approach can infer ongoing human activities with high accuracies.

Keywords

Activity inference, Recurrent Self-Organizing Map, spatio-temporal contexts

Discipline

Computer Engineering | Software Engineering

Research Areas

Data Science and Engineering

Publication

Proceedings of the 24th British Machine Vision Conference (BMVC 2013), Bristol, September 9-13

First Page

1

Last Page

11

Identifier

10.5244/C.27.11

Publisher

BMVA

City or Country

Bristol

Additional URL

https://doi.org/10.5244/C.27.11

Share

COinS