Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2020
Abstract
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.
Keywords
data mining, Natural language processing, self-admitted technical debt
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA): 26-28 August, Slovenia: Proceedings
First Page
385
Last Page
388
ISBN
9781728195322
Identifier
10.1109/SEAA51224.2020.00069
Publisher
IEEE Computer Society
City or Country
Los Alamitos, CA
Citation
RANTALA, Leevi; MANTYLA, Mika; and LO, David.
Prevalence, Contents and Automatic Detection of KL-SATD. (2020). 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA): 26-28 August, Slovenia: Proceedings. 385-388.
Available at: https://ink.library.smu.edu.sg/sis_research/5624
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/SEAA51224.2020.00069