Research Collection School Of Computing and Information Systems

Question-guided hybrid convolution for visual question answering

Peng GAO, Chinese University of Hong Kong
Pan LU, Chinese University of Hong Kong
Hongsheng LI, Chinese University of Hong Kong
Shuang LI, Chinese University of Hong Kong
Yikang LI, Chinese University of Hong Kong
Steven C. H. HOI, Singapore Management UniversityFollow
Xiaogang WANG, Chinese University of Hong Kong

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

9-2018

Abstract

In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and question-dependent kernels, to reduce the parameter size and alleviate over-fitting. The hybrid convolution can generate discriminative multi-modal features with fewer parameters. The proposed approach is also complementary to existing bilinear pooling fusion and attention based VQA methods. By integrating with them, our method could further boost the performance. Extensive experiments on public VQA datasets validate the effectiveness of QGHC.

Keywords

VQA, Dynamic Parameter Prediction, Group Convolution

Discipline

Databases and Information Systems | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

Computer vision ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings

Volume

11205

First Page

485

Last Page

501

ISBN

9783030012465

Identifier

10.1007/978-3-030-01246-5_29

Publisher

Springer

City or Country

Cham

Citation

GAO, Peng; LU, Pan; LI, Hongsheng; LI, Shuang; LI, Yikang; HOI, Steven C. H.; and WANG, Xiaogang. Question-guided hybrid convolution for visual question answering. (2018). Computer vision ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings. 11205, 485-501.
Available at: https://ink.library.smu.edu.sg/sis_research/4182

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-3-030-01246-5_29

Download

Find it in your library

Included in

Databases and Information Systems Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Question-guided hybrid convolution for visual question answering

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Question-guided hybrid convolution for visual question answering

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links