Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2007

Abstract

Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by exploring both visual and textual cues from the visual vocabulary and semantic context respectively. The vocabulary, which provides entries for visual keywords, is formed by the clustering of local keypoints. The semantic context is inferred from the speech transcript surrounding a keyframe. We experiment the usefulness of visual keywords and semantic context, separately and jointly, using cosine similarity and language models. By linearly fusing both modalities, performance improvement is reported compared with the techniques with keypoint matching. While matching suffers from expensive computation due to the need of online nearest neighbor search, our approach is effective and efficient enough for online video search.

Keywords

Image retrieval, Language model, Multiple modalities, Near-duplicate keyframe, News videos, Similarity measure

Discipline

Databases and Information Systems | Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR 2007, Amsterdam, July 9 - 11

First Page

162

Last Page

169

ISBN

9781595937339

Identifier

10.1145/1282280.1282309

Publisher

ACM

City or Country

Amsterdam

Share

COinS