Self-admitted technical debts identification: How far are we?
Publication Type
Conference Proceeding Article
Publication Date
3-2024
Abstract
Self-admitted technical debt (SATD) is a kind of technical debt that is already acknowledged by the developers and needs additional work or resources to address in the future. In recent years, though many methods have been proposed to detect SATDs, these methods have mainly focused on Java-type code comments published by Maldonado et al. It is unclear whether these methods trained on Maldonado's code comments dataset can find SATD in other programming languages or other software artifacts, such as issue trackers, pull requests, and commit messages effectively. In order to answer the above confusion and investigate how far our community has progressed in the field of SATD identification, we first collect a comprehensive dataset that contains SATDs in code comments from four different programming languages (java, python, docker file, XML) and SATDs in other different artifacts (issue tracker, pull requests, commit messages) from previous papers working in the field of SATD. Then, we re-train the existing models with Maldonado's code comments dataset and test all the models on other programming languages and other artifacts. The results show that existing SATD identification methods can find SATDs in other non-Java languages, but perform poorly in identifying SATDs from three other different artifacts. In addition, in order to simultaneously identify four different artifacts of SATDs, we develop a Multi-Task Learning model utilizing BERT for SATD identification (MT-BERT-SATD). Considering four different artifacts and the SATD identification tasks, MT-BERT-SATD achieves an average F1-score of 0.712 (0.625-0.859), which is superior to existing models from 4.6% to 30.4%. Results show that MT-BERT-SATD can effectively identify SATD instances across explored programming languages and software artifacts, indicating its capability to identify SATD instances in new and unexplored programming languages and software artifacts.
Keywords
multi-task learning, Self-Admitted Technical Debt, MT-BERT-SATD
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Rovaniemi, Finland, March 12-15
First Page
804
Last Page
815
Identifier
10.1109/SANER60148.2024.00087
Publisher
IEEE
City or Country
Los Alamitos, CA
Citation
GU, Hao; ZHANG, Shichao; HUANG, Qiao; LIAO, Zhifang; LIU, Jiakun; and LO, David.
Self-admitted technical debts identification: How far are we?. (2024). Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Rovaniemi, Finland, March 12-15. 804-815.
Available at: https://ink.library.smu.edu.sg/sis_research/9260
Additional URL
https://doi.org/10.1109/SANER60148.2024.00087