Publication Type
Journal Article
Version
publishedVersion
Publication Date
8-2012
Abstract
The main challenge of a search engine is to find information that are relevant and appropriate. However, this can become difficult when queries are issued using ambiguous words. Rijsbergen first hypothesized a clustering approach for web pages wherein closely associated pages are treated as a semantic group with the same relevance to the query (Rijsbergen 1979). In this paper, we extend Rijsbergen’s cluster hypothesis to multimedia content such as images. Given a user query, the polysemy in the return image set is related to the many possible meanings of the query. We develop a method to cluster the polysemous images into their semantic categories. The resulting clusters can be seen as the visual senses of the query, which collectively embody the visual interpretations of the query. At the heart of our method is a non-parametric Bayesian approach that exploits the complementary text and visual information of images for semantic clustering. Latent structures of polysemous images are mined using the Hierarchical Dirichlet Process (HDP). HDP is a non-parametric Bayesian model that represents images using a mixture of components. The main advantage of our model is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) of images. The same set of components is used to model all images, with only the mixture weights varying amongst images. Evaluation results on a large collection of web images show the efficacy of our approach.
Keywords
Hierarchical Dirichlet Process, Non-parametric models, Image clustering, Sense disambiguation
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
Multimedia Tools and Applications
Volume
56
Issue
3
First Page
509
Last Page
534
ISSN
1380-7501
Identifier
10.1007/s11042-010-0615-y
Publisher
Springer (part of Springer Nature): Springer Open Choice Hybrid Journals
Citation
WAN, Kong-Wah; TAN, Ah-hwee; LIM, Joo-Hwee; and CHIA, Liang-Tien.
A non-parametric visual-sense model of images: Extending the cluster hypothesis beyond text. (2012). Multimedia Tools and Applications. 56, (3), 509-534.
Available at: https://ink.library.smu.edu.sg/sis_research/5204
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s11042-010-0615-y
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons