Publication Type

Journal Article

Version

acceptedVersion

Publication Date

1-2011

Abstract

As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective scheme for various information retrieval problems. In this paper, we propose a novel multi-modality fusion approach for video search, where the search modalities are derived from a diverse set of knowledge sources, such as text transcript from speech recognition, low-level visual features from video frames, and high-level semantic visual concepts from supervised learning. Since the effectiveness of each search modality greatly depends on specific user queries, prompt determination of the importance of a modality to a user query is a critical issue in multi-modality search. Our proposed approach, named concept-driven multimodality fusion (CDMF), explores a large set of predefined semantic concepts for computing multi-modality fusion weights in a novel way. Specifically, in CDMF, we decompose the query-modality relationship into two components that are much easier to compute: query-concept relatedness and concept-modality relevancy. The former can be efficiently estimated online using semantic and visual mapping techniques, while the latter can be computed offline based on concept detection accuracy of each modality. Such a decomposition facilitates the need of adaptive learning of fusion weights for each user query on-the-fly, in contrast to the existing approaches which mostly adopted predefined query classes and/or modality weights. Experimental results on TREC video-retrieval evaluation 2005-2008 dataset validate the effectiveness of our approach, which outperforms the existing multi-modality fusion methods and achieves near-optimal performance (from oracle fusion) for many test queries.

Keywords

Concept-driven fusion, multi-modality, semantic concept, video search

Discipline

Data Storage Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

IEEE Transactions on Circuits and Systems for Video Technology

Volume

Issue

First Page

Last Page

ISSN

1051-8215

Identifier

10.1109/TCSVT.2011.2105597

Publisher

Institute of Electrical and Electronics Engineers

Citation

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Data Storage Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Concept-driven multi-modality fusion for video search

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Concept-driven multi-modality fusion for video search

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links