Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2023

Abstract

Numerous pre-training techniques for visual document understanding (VDU) have recently shown substantial improvements in performance across a wide range of document tasks. However, these pre-trained VDU models cannot guarantee continued success when the distribution of test data differs from the distribution of training data. In this paper, to investigate how robust existing pre-trained VDU models are to various distribution shifts, we first develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks specifically. The Do-GOOD benchmark defines the underlying mechanisms that result in different distribution shifts and contains 9 OOD datasets covering 3 VDU related tasks, e.g., document information extraction, classification and question answering. We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms on these OOD datasets. Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms. The code and datasets for our Do-GOOD benchmark can be found at https://github.com/MAEHCM/Do-GOOD.

Keywords

out-of-distribution, pre-trained models, visual document understanding, document information extraction

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information, Taipei, July 23-27

First Page

569

Last Page

579

ISBN

9781450394086

Identifier

10.1145/3539618.3591670

Publisher

ACM

City or Country

New York

Citation

HE, Jiabang; HU, Yi; WANG, Lei; XU, Xing; LIU, Ning; and LIU, Hui. Do-GOOD: Towards distribution shift evaluation for pre-trained visual document understanding models. (2023). SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information, Taipei, July 23-27. 569-579.
Available at: https://ink.library.smu.edu.sg/sis_research/8145

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3539618.3591670

Download

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Do-GOOD: Towards distribution shift evaluation for pre-trained visual document understanding models

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Do-GOOD: Towards distribution shift evaluation for pre-trained visual document understanding models

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links