Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2023
Abstract
Numerous pre-training techniques for visual document understanding (VDU) have recently shown substantial improvements in performance across a wide range of document tasks. However, these pre-trained VDU models cannot guarantee continued success when the distribution of test data differs from the distribution of training data. In this paper, to investigate how robust existing pre-trained VDU models are to various distribution shifts, we first develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks specifically. The Do-GOOD benchmark defines the underlying mechanisms that result in different distribution shifts and contains 9 OOD datasets covering 3 VDU related tasks, e.g., document information extraction, classification and question answering. We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms on these OOD datasets. Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms. The code and datasets for our Do-GOOD benchmark can be found at https://github.com/MAEHCM/Do-GOOD.
Keywords
out-of-distribution, pre-trained models, visual document understanding, document information extraction
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information, Taipei, July 23-27
First Page
569
Last Page
579
ISBN
9781450394086
Identifier
10.1145/3539618.3591670
Publisher
ACM
City or Country
New York
Citation
HE, Jiabang; HU, Yi; WANG, Lei; XU, Xing; LIU, Ning; and LIU, Hui.
Do-GOOD: Towards distribution shift evaluation for pre-trained visual document understanding models. (2023). SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information, Taipei, July 23-27. 569-579.
Available at: https://ink.library.smu.edu.sg/sis_research/8145
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3539618.3591670
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons