Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

2-2024

Abstract

While embedding techniques such as CLIP have considerably boosted search performance, user strategies in interactive video search still largely operate on a trial-and-error basis. Users are often required to manually adjust their queries and carefully inspect the search results, which greatly rely on the users’ capability and proficiency. Recent advancements in large language models (LLMs) and generative models offer promising avenues for enhancing interactivity in video retrieval and reducing the personal bias in query interpretation, particularly in the known-item search. Specifically, LLMs can expand and diversify the semantics of the queries while avoiding grammar mistakes or the language barrier. In addition, generative models have the ability to imagine or visualize the verbose query as images. We integrate these new LLM capabilities into our existing system and evaluate their effectiveness on V3C1 and V3C2 datasets.

Keywords

Generative Model, Interactive Video Retrieval, Known-Item Search, Large Language Models

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

MultiMedia Modeling, MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings

Volume

14557

First Page

Last Page

ISBN

9783031533013

Identifier

10.1007/978-3-031-53302-0_35

Publisher

Springer

City or Country

Cham

Citation

MA, Zhixin; WU, Jiaxin; and NGO, Chong-wah. Leveraging LLMs and generative models for interactive known-item video search. (2024). MultiMedia Modeling, MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings. 14557, 1-7.
Available at: https://ink.library.smu.edu.sg/sis_research/8748

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-3-031-53302-0_35

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Leveraging LLMs and generative models for interactive known-item video search

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Leveraging LLMs and generative models for interactive known-item video search

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links