Research Collection School Of Computing and Information Systems

HYDRA: Massively compositional model for cross-project defect prediction

Publication Type

Journal Article

Version

publishedVersion

Publication Date

10-2016

Abstract

Most software defect prediction approaches are trained and applied on data from the same project. However, often a new project does not have enough training data. Cross-project defect prediction, which uses data from other projects to predict defects in a particular project, provides a new perspective to defect prediction. In this work, we propose a HYbrid moDel Reconstruction Approach (HYDRA) for cross-project defect prediction, which includes two phases: genetic algorithm (GA) phase and ensemble learning (EL) phase. These two phases create a massive composition of classifiers. To examine the benefits of HYDRA, we perform experiments on 29 datasets from the PROMISE repository which contains a total of 11,196 instances (i.e., Java classes) labeled as defective or clean. We experiment with logistic regression as the underlying classification algorithm of HYDRA. We compare our approach with the most recently proposed cross-project defect prediction approaches: TCA+ by Nam et al., Peters filter by Peters et al., GP by Liu et al., MO by Canfora et al., and CODEP by Panichella et al. Our results show that HYDRA achieves an average F1-score of 0.544. On average, across the 29 datasets, these results correspond to an improvement in the F1-scores of 26.22%, 34.99%, 47.43%, 28.61%, and 30.14% over TCA+, Peters filter, GP, MO, and CODEP, respectively. In addition, HYDRA on average can discover 33% of all bugs if developers inspect the top 20% lines of code, which improves the best baseline approach (TCA+) by 44.41%. We also find that HYDRA improves the F1-score of Zero-R which predict all the instances to be defective by 5.42%, but improves Zero-R by 58.65% when inspecting the top 20% lines of code. In practice, Zero-R can be hard to use since it simply predicts all of the instances to be defective, and thus developers have to inspect all of the instances to find the defective ones. Moreover, we notice the improvement of HYDRA over other baseline approaches in terms of F1-score and when inspecting the top 20% lines of code are substantial, and in most cases the improvements are significant and have large effect sizes across the 29 datasets.

Keywords

Ensemble Learning, Cross-project Defect Prediction, Transfer Learning, Genetic Algorithm

Discipline

Computer Sciences | Software Engineering | Theory and Algorithms

Research Areas

Software and Cyber-Physical Systems

Publication

IEEE Transactions on Software Engineering

Volume

Issue

First Page

977

Last Page

998

ISSN

0098-5589

Identifier

10.1109/TSE.2016.2543218

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Citation

XIA, Xin; David LO; PAN, Sinno Jialin; NAGAPPAN, Nachiappan; and WANG, Xinyu. HYDRA: Massively compositional model for cross-project defect prediction. (2016). IEEE Transactions on Software Engineering. 42, (10), 977-998.
Available at: https://ink.library.smu.edu.sg/sis_research/3415

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.ieeecomputersociety.org/10.1109/TSE.2016.2543218

Download

Find it in your library

Included in

Software Engineering Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

HYDRA: Massively compositional model for cross-project defect prediction

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

HYDRA: Massively compositional model for cross-project defect prediction

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links