Publication Type

Journal Article

Publication Date



Context:Defect prediction is a very meaningful topic, particularly at change-level. Change-level defectprediction, which is also referred as just-in-time defect prediction, could not only ensure software qualityin the development process, but also make the developers check and fix the defects in time [1].Objective: Ensemble learning becomes a hot topic in recent years. There have been several studies aboutapplying ensemble learning to defect prediction [2–5]. Traditional ensemble learning approaches onlyhave one layer, i.e., they use ensemble learning once. There are few studies that leverages ensemblelearning twice or more. To bridge this research gap, we try to hybridize various ensemble learning methodsto see if it will improve the performance of just-in-time defect prediction. In particular, we focus onone way to do this by hybridizing bagging and stacking together and leave other possibly hybridizationstrategies for future work.Method: In this paper, we propose a two-layer ensemble learning approach TLEL which leverages decisiontree and ensemble learning to improve the performance of just-in-time defect prediction. In the innerlayer, we combine decision tree and bagging to build a Random Forest model. In the outer layer, weuse random under-sampling to train many different Random Forest models and use stacking to ensemblethem once more.Results: To evaluate the performance of TLEL, we use two metrics, i.e., cost effectiveness and F1-score.We perform experiments on the datasets from six large open source projects, i.e., Bugzilla, Columba, JDT,Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. Also, we compare our approachwith three baselines, i.e., Deeper, the approach proposed by us [6], DNC, the approach proposed by Wanget al. [2], and MKEL, the approach proposed by Wang et al. [3]. The experimental results show that onaverage across the six datasets, TLEL could discover over 70% of the bugs by reviewing only 20% of thelines of code, as compared with about 50% for the baselines. In addition, the F1-scores TLEL can achieveare substantially and statistically significantly higher than those of three baselines across the six datasets.Conclusion: TLEL can achieve a substantial and statistically significant improvement over the state-of-theartmethods, i.e., Deeper, DNC and MKEL. Moreover, TLEL could discover over 70% of the bugs by reviewingonly 20% of the lines of code.


Ensemble learning, Just-in-time defect prediction, Cost effectiveness


Information Security | OS and Networks

Research Areas



Information and Software Technology



First Page


Last Page








Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Additional URL