Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2010

Abstract

Current content-based video copy detection approaches mostly concentrate on the visual cues and neglect the audio information. In this paper, we attempt to tackle the video copy detection task resorting to audio information, which is equivalently important as well as visual information in multimedia processing. Firstly, inspired by bag-of visual words model, a bag-of audio words (BoA) representation is proposed to characterize each audio frame. Different from naive singlebased modeling audio retrieval approaches, BoA is a highlevel model due to its perceptual and semantical property. Within the BoA model, a coherency vocabulary indexing structure is adopted to achieve more efficient and effective indexing than single vocabulary of standard BoW model. The coherency vocabulary takes advantage of multiple audio features by computing co-occurrence of them across different feature spaces. By enforcing the tight coherency constraint across feature spaces, coherency vocabulary makes the BoA model more discriminative and robust to various audio transforms. 2D Hough transform is then applied to aggregate scores from matched audio segments. The segements fall into the peak bin is identified as the copy segments in reference video. In addition, we also accomplish video copy detection from both audio and visual cues by performing four late fusion strategies to demonstrate complementarity of audio and visual information in video copy detection. Intensive experiments are conducted on the large-scale dataset of TRECVID 2009 and competitve results are achieved.

Keywords

Audio words, Coherency vocabulary, Copy detection

Discipline

Data Storage Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the ACM International Conference on Image and Video Retrieval, ACM-CIVR 2010, Xi’an, China, July 5-7

First Page

Last Page

ISBN

9781450301176

Identifier

10.1145/1816041.1816057

Publisher

ACM

City or Country

Xi'an, China

Citation

LIU, Yang; ZHAO, Wan-Lei; NGO, Chong-wah; XU, Chang-Sheng; and LU, Han-Qing. Coherent bag-of audio words model for efficient large-scale video copy detection. (2010). Proceedings of the ACM International Conference on Image and Video Retrieval, ACM-CIVR 2010, Xi’an, China, July 5-7. 89-96.
Available at: https://ink.library.smu.edu.sg/sis_research/6522

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Data Storage Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Coherent bag-of audio words model for efficient large-scale video copy detection

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Coherent bag-of audio words model for efficient large-scale video copy detection

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links