Publication Type
Journal Article
Version
publishedVersion
Publication Date
10-2017
Abstract
Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (Vocabulary challenge) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (Concept Selection challenge). In our article, we present our Semantic Event Retrieval Systemwhich (1) shows the importance of high-level concepts in a vocabulary for the retrieval of complex and generic high-level events and (2) uses a novel concept selection method (i-w2v) based on semantic embeddings. Our experiments on the international TRECVID Multimedia Event Detection benchmark show that a diverse vocabulary including high-level concepts improves performance on the retrieval of high-level events in videos and that our novel method outperforms a knowledge-based concept selection method.
Keywords
Content-based visual information retrieval;multimedia event detection;zero shot;semantics
Discipline
Data Storage Systems | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
ACM Transactions on Multimedia Computing, Communications and Applications
Volume
13
Issue
4
First Page
1
Last Page
18
ISSN
1551-6857
Identifier
10.1145/3131288
Publisher
Association for Computing Machinery (ACM)
Citation
1
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.