Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
2-2024
Abstract
While embedding techniques such as CLIP have considerably boosted search performance, user strategies in interactive video search still largely operate on a trial-and-error basis. Users are often required to manually adjust their queries and carefully inspect the search results, which greatly rely on the users’ capability and proficiency. Recent advancements in large language models (LLMs) and generative models offer promising avenues for enhancing interactivity in video retrieval and reducing the personal bias in query interpretation, particularly in the known-item search. Specifically, LLMs can expand and diversify the semantics of the queries while avoiding grammar mistakes or the language barrier. In addition, generative models have the ability to imagine or visualize the verbose query as images. We integrate these new LLM capabilities into our existing system and evaluate their effectiveness on V3C1 and V3C2 datasets.
Keywords
Generative Model, Interactive Video Retrieval, Known-Item Search, Large Language Models
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
MultiMedia Modeling, MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings
Volume
14557
First Page
1
Last Page
7
ISBN
9783031533013
Identifier
10.1007/978-3-031-53302-0_35
Publisher
Springer
City or Country
Cham
Citation
MA, Zhixin; WU, Jiaxin; and NGO, Chong-wah.
Leveraging LLMs and generative models for interactive known-item video search. (2024). MultiMedia Modeling, MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings. 14557, 1-7.
Available at: https://ink.library.smu.edu.sg/sis_research/8748
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-031-53302-0_35
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons