Publication Type

Journal Article

Version

acceptedVersion

Publication Date

1-2023

Abstract

Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using causal inference, which helps us uncover the ever-elusive causalities among the key factors in training, and thus pursue the desired causal effect without the bias. We start from revisiting the process of building a visual recognition system, and then propose a structural causal model (SCM) for the key variables involved in dataset collection and recognition model: object, common sense, bias, context, and label prediction. Based on the SCM, one can observe that there are “good” and “bad” biases. Intuitively, in the image where a car is driving on a high way in a desert, the “good” bias denoting the common-sense context is the highway, and the “bad” bias accounting for the noisy context factor is the desert. We tackle this problem with a novel causal interventional training (CIT) approach, where we control the observed context in each object class. We offer theoretical justifications for CIT and validate it with extensive classification experiments on CIFAR-10, CIFAR-100 and ImageNet, e.g., surpassing the standard deep neural networks ResNet-34 and ResNet-50, respectively, by 0.95% and 0.70% accuracies on the ImageNet. Our code is open-sourced on the GitHub https://github.com/qinwei-hfut/CIT.

Keywords

Image recognition, causality, causal intervention, deep learning, ImageNet

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Multimedia

Volume

25

First Page

1033

Last Page

1044

ISSN

1520-9210

Identifier

10.1109/TMM.2021.3136717

Publisher

Institute of Electrical and Electronics Engineers

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/TMM.2021.3136717

Share

COinS