Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2019
Abstract
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
Keywords
Vision + Language, Vision Applications and Systems, Visual Reasoning
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): Long Beach, CA, June 15-20: Proceedings
First Page
6632
Last Page
6641
ISBN
9781728132938
Identifier
10.1109/CVPR.2019.00680
Publisher
IEEE Computer Society
City or Country
Los Alamitos, CA
Citation
GAO, Peng; JIANG, Zhengkai; YOU, Haoxuan; LU, Pan; HOI, Steven C. H.; WANG, Xiaogang; and LI, Hongsheng.
Dynamic fusion with intra-and inter-modality attention flow for visual question answering. (2019). 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): Long Beach, CA, June 15-20: Proceedings. 6632-6641.
Available at: https://ink.library.smu.edu.sg/sis_research/5260
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/CVPR.2019.00680