Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2023

Abstract

For unsupervised pretraining, mask-reconstruction pretraining (MRP) approaches, e.g. MAE (He et al., 2021) and data2vec (Baevski et al., 2022), randomly mask input patches and then reconstruct the pixels or semantic features of these masked patches via an auto-encoder. Then for a downstream task, supervised fine-tuning the pretrained encoder remarkably surpasses the conventional “supervised learning" (SL) trained from scratch. However, it is still unclear 1) how MRP performs semantic feature learning in the pretraining phase and 2) why it helps in downstream tasks. To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset. Then considering the fact that the pretraining dataset is of huge size and high diversity and thus covers most features in downstream dataset, in fine-tuning phase, the pretrained encoder can capture as much features as it can in downstream datasets, and would not lost these features with theoretical guarantees. In contrast, SL only randomly captures some features due to lottery ticket hypothesis. So MRP provably achieves better performance than SL on the classification tasks. Experimental results testify to our data assumptions and also our theoretical implications.

Discipline

Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 11th International Conference on Learning Representations ICLR 2023: Kigali, Rwanda, May 1-5

First Page

Last Page

Publisher

ICLR

City or Country

Kigali, Rwanda

Citation

PAN, Jiachun; ZHOU, Pan; and YAN, Shuicheng. Towards understanding why mask reconstruction pretraining helps in downstream tasks. (2023). Proceedings of the 11th International Conference on Learning Representations ICLR 2023: Kigali, Rwanda, May 1-5. 1-48.
Available at: https://ink.library.smu.edu.sg/sis_research/9022

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://openreview.net/forum?id=PaEUQiY40Dk

Download

Included in

Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Towards understanding why mask reconstruction pretraining helps in downstream tasks

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Towards understanding why mask reconstruction pretraining helps in downstream tasks

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links