Publication Type

Journal Article

Version

publishedVersion

Publication Date

3-2009

Abstract

Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval.

Keywords

CEMD matching, Linguistic similarity, Near-duplicate keyframe, Semantic concept, Soft-weighting, Visual ontology

Discipline

Data Storage Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Computer Vision and Image Understanding

Volume

113

Issue

First Page

405

Last Page

414

ISSN

1077-3142

Identifier

10.1016/j.cviu.2008.10.002

Publisher

Elsevier

Citation

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Data Storage Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links