Research Collection School Of Computing and Information Systems

Modularized zero-shot VQA with pre-trained models

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

7-2023

Abstract

Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA).Our approach is motivated by a few observations. First, VQA questions often require multiple steps of reasoning, which is still a capability that most PTMs lack. Second, different steps in VQA reasoning chains require different skills such as object detection and relational reasoning, but a single PTM may not possess all these skills. Third, recent work on zero-shot VQA does not explicitly consider multi-step reasoning chains, which makes them less interpretable compared with a decomposition-based approach. We propose a modularized zero-shot network that explicitly decomposes questions into sub reasoning steps and is highly interpretable. We convert sub reasoning tasks to acceptable objectives of PTMs and assign tasks to proper PTMs without any adaptation. Our experiments on two VQA benchmarks under the zero-shot setting demonstrate the effectiveness of our method and better interpretability compared with several baselines.

Keywords

Computational linguistics, Zero-shot learning, Object detection

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Publication

Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023 July 9-14

First Page

Last Page

ISBN

978-1-959429-62-3

Identifier

10.18653/v1/2023.findings-acl.5

Publisher

Association for Computational Linguistics

City or Country

Texas, USA

Citation

CAO, Rui and JIANG, Jing. Modularized zero-shot VQA with pre-trained models. (2023). Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023 July 9-14. 58-76.
Available at: https://ink.library.smu.edu.sg/sis_research/8307

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://10.18653/v1/2023.findings-acl.5

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Modularized zero-shot VQA with pre-trained models

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Modularized zero-shot VQA with pre-trained models

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links