Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

4-2024

Abstract

Pre-trained language models of code, which are built upon large-scale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to outperform BERT. We conduct the first empirical study to explore whether this lightweight alternative can accurately classify source code. Our study is more than applying Cbc to code-related tasks. We first identify an issue that the original implementation overestimates Cbc. After correction, Cbc's performance on defect prediction drops from 80.7% to 63.0%, which is still comparable to CodeBERT (63.7%). We find that hyperparameter settings affect the performance. Besides, results show that Cbc can outperform CodeBERT when the training data is small, making it a good alternative in low-resource settings.

Keywords

Defect Software Prediction, Efficient Learning, Robustness

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering, Lisbon, April 14-20

First Page

450

Last Page

452

ISBN

9798400705021

Identifier

10.1145/3639478.3641229

Publisher

IEEE Computer Society

City or Country

Washington, DC

Citation

YANG, Zhou. Classifying source code: How far can compressor-based classifiers go?. (2024). ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering, Lisbon, April 14-20. 450-452.
Available at: https://ink.library.smu.edu.sg/sis_research/8920

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3639478.3641229

Download

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Classifying source code: How far can compressor-based classifiers go?

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Classifying source code: How far can compressor-based classifiers go?

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links