Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
4-2024
Abstract
Pre-trained language models of code, which are built upon large-scale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to outperform BERT. We conduct the first empirical study to explore whether this lightweight alternative can accurately classify source code. Our study is more than applying Cbc to code-related tasks. We first identify an issue that the original implementation overestimates Cbc. After correction, Cbc's performance on defect prediction drops from 80.7% to 63.0%, which is still comparable to CodeBERT (63.7%). We find that hyperparameter settings affect the performance. Besides, results show that Cbc can outperform CodeBERT when the training data is small, making it a good alternative in low-resource settings.
Keywords
Defect Software Prediction, Efficient Learning, Robustness
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering, Lisbon, April 14-20
First Page
450
Last Page
452
ISBN
9798400705021
Identifier
10.1145/3639478.3641229
Publisher
ACM
City or Country
New York
Citation
YANG, Zhou.
Classifying source code: How far can compressor-based classifiers go?. (2024). ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering, Lisbon, April 14-20. 450-452.
Available at: https://ink.library.smu.edu.sg/sis_research/8920
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Additional URL
https://doi.org/10.1145/3639478.3641229