Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2013
Abstract
Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in real-world applications; (ii) they are often designed for learning distance metrics on uni-modal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified two-stage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.
Keywords
Deep learning, Distance metric learning, Image retrieval, Online learning, Similarity learning
Discipline
Computer Sciences | Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
MM '13: Proceedings of the 21st ACM International Conference on Multimedia: October 21-25, Barcelona, Spain
First Page
153
Last Page
162
ISBN
9781450324045
Identifier
10.1145/2502081.2502112
Publisher
ACM
City or Country
New York
Citation
WU, Pengcheng; HOI, Steven C. H.; XIA, Hao; ZHAO, Peilin; WANG, Dayong; and MIAO, Chunyan.
Online multimodal distance metric learning with application to image retrieval. (2013). MM '13: Proceedings of the 21st ACM International Conference on Multimedia: October 21-25, Barcelona, Spain. 153-162.
Available at: https://ink.library.smu.edu.sg/sis_research/2333
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/2502081.2502112
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons