Research Collection School Of Computing and Information Systems

High impact bug report identification with imbalanced learning strategies

Xinli YANG, Zhejiang University
David LO, Singapore Management UniversityFollow
Xin XIA, Zhejiang University
Qiao HUANG, Zhejiang University
Jianling SUN, Zhejiang University

Publication Type

Journal Article

Version

publishedVersion

Publication Date

1-2017

Abstract

In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.

Keywords

high-impact bug, imbalanced learning, bug report identification

Discipline

Databases and Information Systems | Information Security

Research Areas

Data Science and Engineering

Publication

Journal of Computer Science and Technology

Volume

Issue

First Page

181

Last Page

198

ISSN

1000-9000

Identifier

10.1007/s11390-017-1713-3

Publisher

Springer Verlag (Germany)

Citation

YANG, Xinli; LO, David; XIA, Xin; HUANG, Qiao; and SUN, Jianling. High impact bug report identification with imbalanced learning strategies. (2017). Journal of Computer Science and Technology. 32, (1), 181-198.
Available at: https://ink.library.smu.edu.sg/sis_research/3702

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Comments

Supplementary code and data available from GitHub:

https://github.com/goddding/JCST

Additional URL

https://doi.org/10.1007/s11390-017-1713-3

Download

Download Research Data

Find it in your library

Included in

Databases and Information Systems Commons, Information Security Commons

COinS

Research Collection School Of Computing and Information Systems

High impact bug report identification with imbalanced learning strategies

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Comments

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

High impact bug report identification with imbalanced learning strategies

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Comments

Additional URL

Included in

Share

Search

Links

Browse

Links