Egocentric temporal action proposals

Publication Type

Journal Article

Publication Date

11-2017

Abstract

We present an approach to localize generic actions in egocentric videos, called temporal action proposals (TAPs), for accelerating the action recognition step. An egocentric TAP refers to a sequence of frames that may contain a generic action performed by the wearer of a head-mounted camera, e.g., taking a knife, spreading jam, pouring milk, or cutting carrots. Inspired by object proposals, this paper aims at generating a small number of TAPs, thereby replacing the popular sliding window strategy, for localizing all action events in the input video. To this end, we first propose to temporally segment the input video into action atoms, which are the smallest units that may contain an action. We then apply a hierarchical clustering algorithm with several egocentric cues to generate TAPs. Finally, we propose two actionness networks to score the likelihood of each TAP containing an action. The top ranked candidates are returned as output TAPs. Experimental results show that the proposed TAP detection framework performs significantly better than relevant approaches for egocentric action detection.

Keywords

Atom optics, Optical imaging, Proposals, temporal action proposals, Video processing; Videos

Discipline

Databases and Information Systems

Research Areas

Information Systems and Management

Publication

IEEE Transactions on Image Processing

Volume

27

Issue

2

First Page

764

Last Page

777

ISSN

1057-7149

Identifier

10.1109/TIP.2017.2772904

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/TIP.2017.2772904

This document is currently not available here.

Share

COinS