Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

1-2023

Abstract

Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to assist ViT users in understanding its functionality. This paper introduces EL-VIT, an interactive visual analytics system designed to probe the Vision Transformer and facilitate a better understanding of its operations. The system consists of four layers of visualization views. The first three layers include model overview, knowledge background graph, and model detail view. These three layers elucidate the operation process of ViT from three perspectives: the overall model architecture, detailed explanation, and mathematical operations, enabling users to understand the underlying principles and the transition process between layers. The fourth interpretation view helps ViT users and experts gain a deeper understanding by calculating the cosine similarity between patches. Our two usage scenarios demonstrate the effectiveness and usability of EL-VIT in helping ViT users understand the working mechanism of ViT.

Keywords

Education Tool, Explainable AI, Vision Transformer, Visual Analysis

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

2023 International Conference on Data Mining, ICDM: Shanghai, December 1-4: Proceedings

First Page

118

Last Page

127

ISBN

9798350381641

Identifier

10.1109/ICDMW60847.2023.00023

Publisher

IEEE Computer Society

City or Country

Washington, DC

Citation

ZHOU, Hong; ZHANG, Rui; LAI, Peifeng; GUO, Chaoran; WANG, Yong; SUN, Zhida; and LI, Junjie. EL-VIT: Probing vision transformer with interactive visualization. (2023). 2023 International Conference on Data Mining, ICDM: Shanghai, December 1-4: Proceedings. 118-127.
Available at: https://ink.library.smu.edu.sg/sis_research/8708

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/ICDMW60847.2023.00023

Download

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

EL-VIT: Probing vision transformer with interactive visualization

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

EL-VIT: Probing vision transformer with interactive visualization

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links