Question type-aware debiasing for test-time visual question answering model adaptation
Publication Type
Journal Article
Publication Date
6-2024
Abstract
In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA models to adapt using unlabeled test samples. Current state-of-the-art models select reliable test samples based on fixed entropy thresholds and employ self-supervised debiasing techniques. However, these methods struggle with diverse answer spaces linked to different question types and may fail to identify biased samples that still leverage relevant visual context. In this paper, we propose Question type-guided Entropy Minimization and Debiasing (QED) as a solution for test-time VQA model adaptation. Our approach involves adaptive entropy minimization based on question types to improve the identification of fine-grained and unreliable samples. Additionally, we generate negative samples for each test sample and label them as biased if their answer entropy change rate significantly differs from positive test samples, subsequently removing them. We evaluate our approach on two public benchmarks, VQA-CP v2, and VQA-CP v1, and achieve new state-of-the-art results, with overall accuracy rates of 48.13% and 46.18%, respectively.
Keywords
Test-time adaptation, visual question answering, language debiasing
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
IEEE Transactions on Circuits and Systems for Video Technology
Volume
34
Issue
11
First Page
10805
Last Page
10816
ISSN
1051-8215
Identifier
10.1109/TCSVT.2024.3410041
Publisher
Institute of Electrical and Electronics Engineers
Citation
LIU, Jin; XIE, Jialong; ZHOU, Fengyu; and HE, Shengfeng.
Question type-aware debiasing for test-time visual question answering model adaptation. (2024). IEEE Transactions on Circuits and Systems for Video Technology. 34, (11), 10805-10816.
Available at: https://ink.library.smu.edu.sg/sis_research/9804
Additional URL
https://doi.org/10.1109/TCSVT.2024.3410041