Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2022

Abstract

The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation is thus introduced to assist the training of transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). Our key observation is that the teacher accuracy is not the dominant reason for the student accuracy, but the teacher inductive bias is more important. We demonstrate that lightweight teachers with different architectural inductive biases can be used to co-advise the student transformer with outstanding performances. The rationale behind is that models designed with different inductive biases tend to focus on diverse patterns, and teachers with different inductive biases attain various knowledge despite being trained on the same dataset. The diverse knowledge provides a more precise and comprehensive description of the data and compounds and boosts the performance of the student during distillation. Furthermore, we propose a token inductive bias alignment to align the inductive bias of the token with its target teacher model. With only lightweight teachers provided and using this cross inductive bias distillation method, our vision transformers (termed as CiT) outperform all previous vision transformers (ViT) of the same architecture on ImageNet. Moreover, our small size model CiT-SAK further achieves 82.7% Top-1 accuracy on ImageNet without modifying the attention module of the ViT.

Keywords

Adversarial attack and defense, Distillation method, Inductive bias, Performance, Representation learning, Size models, Teacher models, Teachers'

Discipline

Databases and Information Systems

Research Areas

Information Systems and Management

Publication

Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, June 19-22

First Page

16752

Last Page

16761

ISBN

9781665469463

Identifier

10.1109/CVPR52688.2022.01627

Publisher

IEEE

City or Country

New Jersey

Citation

REN, Sucheng; GAO, Zhengqi; HUA, Tiany; XUE, Zihui; TIAN, Yonglong; HE, Shengfeng; and ZHAO, Hang. Co-advise: Cross inductive bias distillation. (2022). Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, June 19-22. 16752-16761.
Available at: https://ink.library.smu.edu.sg/sis_research/8538

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/CVPR52688.2022.01627

Download

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Co-advise: Cross inductive bias distillation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Co-advise: Cross inductive bias distillation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links