Research Collection School Of Computing and Information Systems

Combined classifier for cross-project defect prediction: An extended empirical study

Publication Type

Journal Article

Version

publishedVersion

Publication Date

4-2018

Abstract

To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP (Logistic) ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP (Logistic) : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP (Logistic) by 36.88%. Bootstrap aggregation (Bagging (J48)) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP (Logistic) by 15.34%.

Keywords

defect prediction;cross-project;classifier combination

Discipline

Software Engineering | Systems Architecture

Research Areas

Data Science and Engineering

Publication

Frontiers of Computer Science -Springer-

Volume

Issue

First Page

280

Last Page

296

ISSN

2095-2228

Identifier

10.1007/s11704-017-6015-y

Publisher

Springer Science Business Media

Citation

ZHANG, Yun; LO, David; XIA, Xin; and SUN, Jianling. Combined classifier for cross-project defect prediction: An extended empirical study. (2018). Frontiers of Computer Science -Springer-. 12, (2), 280-296.
Available at: https://ink.library.smu.edu.sg/sis_research/4128

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/s11704-017-6015-y

Download

Find it in your library

Included in

Software Engineering Commons, Systems Architecture Commons

COinS

Research Collection School Of Computing and Information Systems

Combined classifier for cross-project defect prediction: An extended empirical study

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Combined classifier for cross-project defect prediction: An extended empirical study

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links