Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2009
Abstract
The Bag-of-Words (BoW) model is a promising image representation for annotation. One critical limitation of existing BoW models is the semantic loss during the codebook generation process, in which BoW simply clusters visual words in Euclidian space. However, distance between two visual words in Euclidean space does not necessarily reflect the semantic distance between the two concepts, due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme for learning a codebook such that semantically related features will be mapped to the same visual word. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap. We refer to such a new codebook method as Semantics-Preserving Codebook (SPC) and the corresponding model as Semantics-Preserving Bag-of-Words model (SPBoW). This novel model generates codebook for each object category and only needs to update the codebook for a specific category when incomes an object, which makes it convenient to scale up with the increasing number of objects. Experiments on image annotation tasks with a public testbed from MIT's Labelme project, which contains 11,281 objects of 495 categories, show that the SPC learning scheme is efficient in handling large number of objects and is able to greatly improve the performance of the existing BoW model.
Keywords
Distance metric learning, Bag-of-words model, Semantic gap, Image annotation
Discipline
Databases and Information Systems | Data Storage Systems
Research Areas
Data Science and Engineering
Publication
LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale Multimedia Retrieval and Mining, Beijing, October 23
First Page
19
Last Page
26
ISBN
9781605587561
Identifier
10.1145/1631058.1631064
Publisher
ACM
City or Country
New York
Citation
WU, Lei; HOI, Steven C. H.; and YU, Nenghai.
Semantics-preserving bag-of-words models for efficient image annotation. (2009). LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale Multimedia Retrieval and Mining, Beijing, October 23. 19-26.
Available at: https://ink.library.smu.edu.sg/sis_research/4189
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/1631058.1631064