The best of both worlds: integrating semantic features with expert features for defect prediction and localization
Publication Type
Conference Proceeding Article
Publication Date
11-2022
Abstract
To improve software quality, just-in-time defect prediction (JIT-DP) (identifying defect-inducing commits) and just-in-time defect localization (JIT-DL) (identifying defect-inducing code lines in commits) have been widely studied by learning semantic features or expert features respectively, and indeed achieved promising performance. Semantic features and expert features describe code change commits from different aspects, however, the best of the two features have not been fully explored together to boost the just-in-time defect prediction and localization in the literature yet. Additional, JIT-DP identifies defects at the coarse commit level, while as the consequent task of JIT-DP, JIT-DL cannot achieve the accurate localization of defect-inducing code lines in a commit without JIT-DP. We hypothesize that the two JIT tasks can be combined together to boost the accurate prediction and localization of defect-inducing commits by integrating semantic features with expert features. Therefore, we propose to build a unified model, JIT-Fine, for the just-in-time defect prediction and localization by leveraging the best of semantic features and expert features. To assess the feasibility of JIT-Fine, we first build a large-scale line-level manually labeled dataset, JIT-Defects4J. Then, we make a comprehensive comparison with six state-of-the-art baselines under various settings using ten performance measures grouped into two types: effort-agnostic and effort-aware. The experimental results indicate that JIT-Fine can outperform all state-of-the-art baselines on both JIT-DP and JITDL tasks in terms of ten performance measures with a substantial improvement (i.e., 10%-629% in terms of effort-agnostic measures on JIT-DP, 5%-54% in terms of effort-aware measures on JIT-DP, and 4%-117% in terms of effort-aware measures on JIT-DL).
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2022 November 14 - 18
First Page
672
Last Page
683
ISBN
9781450394130
Identifier
10.1145/3540250.3549165
Publisher
Association for Computing Machinery
City or Country
New York
Citation
NI, Chao; WANG, Wei; YANG, Kaiwen; XIA, Xin; LIU, Kui; and LO, David.
The best of both worlds: integrating semantic features with expert features for defect prediction and localization. (2022). Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2022 November 14 - 18. 672-683.
Available at: https://ink.library.smu.edu.sg/sis_research/7729
Additional URL
https://doi.org/10.1145/3540250.3549165