Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
11-2018
Abstract
The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.
Keywords
Correlated Metrics, Defect Prediction, Feature Selection, Model Interpretation, Software Analytics
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 34th International Conference on Software Maintenance and Evolution, Madrid, Spain, 2018 September 23-29
First Page
92
Last Page
103
ISBN
9781538678701
Identifier
10.1109/ICSME.2018.00018
Publisher
Institute of Electrical and Electronics Engineers Inc.
City or Country
Madrid, Spain
Citation
JIARPAKDEE, Jirayus; TANTITHAMTHAVORN, Chakkrit; and TREUDE, Christoph.
Autospearman: Automatically mitigating correlated software metrics for interpreting defect models. (2018). Proceedings of the 34th International Conference on Software Maintenance and Evolution, Madrid, Spain, 2018 September 23-29. 92-103.
Available at: https://ink.library.smu.edu.sg/sis_research/8829
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ICSME.2018.00018