Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

2-2024

Abstract

While embedding techniques such as CLIP have considerably boosted search performance, user strategies in interactive video search still largely operate on a trial-and-error basis. Users are often required to manually adjust their queries and carefully inspect the search results, which greatly rely on the users’ capability and proficiency. Recent advancements in large language models (LLMs) and generative models offer promising avenues for enhancing interactivity in video retrieval and reducing the personal bias in query interpretation, particularly in the known-item search. Specifically, LLMs can expand and diversify the semantics of the queries while avoiding grammar mistakes or the language barrier. In addition, generative models have the ability to imagine or visualize the verbose query as images. We integrate these new LLM capabilities into our existing system and evaluate their effectiveness on V3C1 and V3C2 datasets.

Keywords

Generative Model, Interactive Video Retrieval, Known-Item Search, Large Language Models

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

MultiMedia Modeling, MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings

Volume

14557

First Page

1

Last Page

7

ISBN

9783031533013

Identifier

10.1007/978-3-031-53302-0_35

Publisher

Springer

City or Country

Cham

Additional URL

https://doi.org/10.1007/978-3-031-53302-0_35

Share

COinS