Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
9-2018
Abstract
In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and question-dependent kernels, to reduce the parameter size and alleviate over-fitting. The hybrid convolution can generate discriminative multi-modal features with fewer parameters. The proposed approach is also complementary to existing bilinear pooling fusion and attention based VQA methods. By integrating with them, our method could further boost the performance. Extensive experiments on public VQA datasets validate the effectiveness of QGHC.
Keywords
VQA, Dynamic Parameter Prediction, Group Convolution
Discipline
Databases and Information Systems | Theory and Algorithms
Research Areas
Data Science and Engineering
Publication
Computer vision ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings
Volume
11205
First Page
485
Last Page
501
ISBN
9783030012465
Identifier
10.1007/978-3-030-01246-5_29
Publisher
Springer
City or Country
Cham
Citation
GAO, Peng; LU, Pan; LI, Hongsheng; LI, Shuang; LI, Yikang; HOI, Steven C. H.; and WANG, Xiaogang.
Question-guided hybrid convolution for visual question answering. (2018). Computer vision ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings. 11205, 485-501.
Available at: https://ink.library.smu.edu.sg/sis_research/4182
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-030-01246-5_29