Research Collection School Of Computing and Information Systems

CodeS: Towards code model generalization under distribution shift

Qiang HU
Yuejun GUO
Xiaofei XIE, Singapore Management UniversityFollow
Maxime CORDY
Lei MA
Mike PAPADAKIS
Yves Le TRAON

Publication Type

Conference Proceeding Article

Publication Date

5-2023

Abstract

Distribution shift has been a longstanding challenge for the reliable deployment of deep learning (DL) models due to unexpected accuracy degradation. Although DL has been becoming a driving force for large-scale source code analysis in the big code era, limited progress has been made on distribution shift analysis and benchmarking for source code tasks. To fill this gap, this paper initiates to propose CodeS, a distribution shift benchmark dataset, for source code learning. Specifically, CodeS supports two programming languages (Java and Python) and five shift types (task, programmer, time-stamp, token, and concrete syntax tree). Extensive experiments based on CodeS reveal that 1) out-of-distribution detectors from other domains (e.g., computer vision) do not generalize to source code, 2) all code classification models suffer from distribution shifts, 3) representation-based shifts have a higher impact on the model than others, and 4) pre-trained bimodal models are relatively more resistant to distribution shifts.

Keywords

Benchmark datasets, Concrete syntax, Driving forces, Large scale source, Learning models, Model generalization, Source code analysis, Source code learning, distribution shift, Source codes, Time-stamp

Discipline

Databases and Information Systems

Research Areas

Information Systems and Management

Publication

Proceedings of the 45th International Conference on Software Engineering: New Ideas and Emerging Results, Melbourne, Australia, May 14-20

First Page

Last Page

ISBN

9798350300390

Identifier

10.1109/ICSE-NIER58687.2023.00007

City or Country

New York

Citation

HU, Qiang; GUO, Yuejun; XIE, Xiaofei; CORDY, Maxime; MA, Lei; PAPADAKIS, Mike; and TRAON, Yves Le. CodeS: Towards code model generalization under distribution shift. (2023). Proceedings of the 45th International Conference on Software Engineering: New Ideas and Emerging Results, Melbourne, Australia, May 14-20. 1-6.
Available at: https://ink.library.smu.edu.sg/sis_research/8244

This document is currently not available here.

COinS

Research Collection School Of Computing and Information Systems

CodeS: Towards code model generalization under distribution shift

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

City or Country

Citation

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

CodeS: Towards code model generalization under distribution shift

Author

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

City or Country

Citation

Share

Search

Links

Browse

Links