Research Collection School Of Computing and Information Systems

Reinforcement learning-based interactive video search

Zhixin MA, Singapore Management UniversityFollow
Jiaxin WU, City University of Hong Kong
Zhijian HOU, City University of Hong Kong
Chong-wah NGO, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2022

Abstract

Despite the rapid progress in text-to-video search due to the advancement of cross-modal representation learning, the existing techniques still fall short in helping users to rapidly identify the search targets. Particularly, in the situation that a system suggests a long list of similar candidates, the user needs to painstakingly inspect every search result. The experience is frustrated with repeated watching of similar clips, and more frustratingly, the search targets may be overlooked due to mental tiredness. This paper explores reinforcement learning-based (RL) searching to relieve the user from the burden of brute force inspection. Specifically, the system maintains a graph connecting shots based on their temporal and semantic relationship. Using the navigation paths outlined by the graph, an RL agent learns to seek a path that maximizes the reward based on the continuous user feedback. In each round of interaction, the system will recommend one most likely video candidate for users to inspect. In addition to RL, two incremental changes are introduced to improve VIREO search engine. First, the dual-task cross-modal representation learning has been revised to index phrases and model user query and unlikelihood relationship more effectively. Second, two more deep features extracted from SlowFast and Swin-Transformer, respectively, are involved in dual-task model training. Substantial improvement is noticed for the automatic Ad-hoc search (AVS) task on the V3C1 dataset.

Keywords

Feature enhancement, Interactive video retrieval, Query understanding, Reinforcement learning

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

MMM 2022: Proceedings of the 28th International Conference, Phu Quoc, June 6-10

Volume

13142

First Page

549

Last Page

555

ISBN

9783030983543

Identifier

10.1007/978-3-030-98355-0_53

Publisher

Springer

City or Country

Cham

Citation

MA, Zhixin; WU, Jiaxin; HOU, Zhijian; and NGO, Chong-wah. Reinforcement learning-based interactive video search. (2022). MMM 2022: Proceedings of the 28th International Conference, Phu Quoc, June 6-10. 13142, 549-555.
Available at: https://ink.library.smu.edu.sg/sis_research/7503

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-3-030-98355-0_53

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Reinforcement learning-based interactive video search

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Reinforcement learning-based interactive video search

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links