Research Collection School Of Computing and Information Systems

Improving interpretable embeddings for ad-hoc video search with generative captions and multi-word concept bank

Jiaxin WU
Chong-wah NGO, Singapore Management UniversityFollow
Wing-Kwong CHAN

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

6-2024

Abstract

Aligning a user query and video clips in cross-modal latent space and that with semantic concepts are two mainstream approaches for ad-hoc video search (AVS). However, the effectiveness of existing approaches is bottlenecked by the small sizes of available video-text datasets and the low quality of concept banks, which results in the failures of unseen queries and the out-of-vocabulary problem. This paper addresses these two problems by constructing a new dataset and developing a multi-word concept bank. Specifically, capitalizing on a generative model, we construct a new dataset consisting of 7 million generated text and video pairs for pre-training. To tackle the out-of-vocabulary problem, we develop a multi-word concept bank based on syntax analysis to enhance the capability of a state-of-the- art interpretable AVS method in modelling relationships between query words. We also study the impact of current advanced features on the method. Experimental results show that the integration of the above-proposed elements doubles the R@1 performance of the AVS method on the MSRVTT dataset and improves the xinfAP on the TRECVid AVS query sets for 2016-2023 (eight years) by a margin from 2% to 77%, with an average about 20%. The code and model are available at https://github.com/nikkiwoo-gh/Improved-ITV.

Keywords

Ad-hoc video search, Interpretable embedding, Large-scale videotext dataset, Concept bank construction, Out of vocabulary

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval, Phuket, Thailand, June 10-14

First Page

Last Page

ISBN

9798400706196

Identifier

10.1145/3652583.3658052

Publisher

ACM

City or Country

New York

Citation

WU, Jiaxin; NGO, Chong-wah; and CHAN, Wing-Kwong. Improving interpretable embeddings for ad-hoc video search with generative captions and multi-word concept bank. (2024). ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval, Phuket, Thailand, June 10-14. 73-82.
Available at: https://ink.library.smu.edu.sg/sis_research/9288

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3652583.3658052

Download

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Improving interpretable embeddings for ad-hoc video search with generative captions and multi-word concept bank

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Improving interpretable embeddings for ad-hoc video search with generative captions and multi-word concept bank

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links