Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
3-2022
Abstract
We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually a trivial languagelocation association without visual reasoning, e.g., grounding any language query containing sheep to the nearly central regions, due to that most queries about sheep have groundtruth locations at the image center. First, we frame the visual grounding pipeline into a causal graph, which shows the causalities among image, query, target location and underlying confounder. Through the causal graph, we know how to break the grounding bottleneck: deconfounded visual grounding. Second, to tackle the challenge that the confounder is unobserved in general, we propose a confounder-agnostic approach called: Referring Expression Deconfounder (RED), to remove the confounding bias. Third, we implement RED as a simple language attention, which can be applied in any grounding method.
Keywords
Computer Vision (CV)
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual Conference, 2022 February 2 - March 1
First Page
998
Last Page
1006
Identifier
10.1609/aaai.v36i1.19983
Publisher
AAAI
City or Country
Virtual Conference
Citation
HUANG, Jianqiang; QIN, Yu; QI, Jiaxin; SUN, Qianru; and ZHANG, Hanwang.
Deconfounded visual grounding. (2022). Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual Conference, 2022 February 2 - March 1. 998-1006.
Available at: https://ink.library.smu.edu.sg/sis_research/7484
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1609/aaai.v36i1.19983
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons