Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

4-2024

Abstract

Pre-trained language models of code, which are built upon large-scale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to outperform BERT. We conduct the first empirical study to explore whether this lightweight alternative can accurately classify source code. Our study is more than applying Cbc to code-related tasks. We first identify an issue that the original implementation overestimates Cbc. After correction, Cbc's performance on defect prediction drops from 80.7% to 63.0%, which is still comparable to CodeBERT (63.7%). We find that hyperparameter settings affect the performance. Besides, results show that Cbc can outperform CodeBERT when the training data is small, making it a good alternative in low-resource settings.

Keywords

Defect Software Prediction, Efficient Learning, Robustness

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering, Lisbon, April 14-20

First Page

450

Last Page

452

ISBN

9798400705021

Identifier

10.1145/3639478.3641229

Publisher

IEEE Computer Society

City or Country

Washington, DC

Copyright Owner and License

Authors

Creative Commons License

Creative Commons Attribution 3.0 License
This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3639478.3641229

Share

COinS