Publication Type

Journal Article

Version

publishedVersion

Publication Date

10-2017

Abstract

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (Vocabulary challenge) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (Concept Selection challenge). In our article, we present our Semantic Event Retrieval Systemwhich (1) shows the importance of high-level concepts in a vocabulary for the retrieval of complex and generic high-level events and (2) uses a novel concept selection method (i-w2v) based on semantic embeddings. Our experiments on the international TRECVID Multimedia Event Detection benchmark show that a diverse vocabulary including high-level concepts improves performance on the retrieval of high-level events in videos and that our novel method outperforms a knowledge-based concept selection method.

Keywords

Content-based visual information retrieval;multimedia event detection;zero shot;semantics

Discipline

Data Storage Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

ACM Transactions on Multimedia Computing, Communications and Applications

Volume

13

Issue

4

First Page

1

Last Page

18

ISSN

1551-6857

Identifier

10.1145/3131288

Publisher

Association for Computing Machinery (ACM)

Share

COinS