Research Collection School Of Computing and Information Systems

Chaff from the wheat: Characterizing and determining valid bug reports

Yuanrui FAN, Zhejiang University
Xin XIA, Zhejiang University
David LO, Singapore Management UniversityFollow
Ahmed E. HASSAN, Queen's University - Kingston, Ontario

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

5-2020

Abstract

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is valid can help developers prioritize their triaging tasks and avoid wasting time and effort on invalid bug reports. In this study, motivated by the above needs, we propose an approach which can determine whether a newly submitted bug report is valid. Our approach first extracts 33 features from bug reports. The extracted features are grouped along 5 dimensions, i.e., reporter experience, collaboration network, completeness, readability and text. Based on these features, we use a random forest classifier to identify valid bug reports. To evaluate the effectiveness of our approach, we experiment on large-scale datasets containing a total of 560,697 bug reports from five open source projects (i.e., Eclipse, Netbeans, Mozilla, Firefox and Thunderbird). On average, across the five datasets, our approach achieves an F1-score for valid bug reports and F1-score for invalid ones of 0.74 and 0.67, respectively. Moreover, our approach achieves an average AUC of 0.81. In terms of AUC and F1-scores for valid and invalid bug reports, our approach statistically significantly outperforms two baselines using features that are proposed by Zanetti et al. [104]. We also study the most important features that distinguish valid bug reports from invalid ones. We find that the textual features of a bug report and reporter's experience are the most important factors to distinguish valid bug reports from invalid ones.

Keywords

Bug Report, Collaboration, Computer bugs, Feature extraction, Feature Generation, Forestry, Machine Learning, Software, Support vector machines, Task analysis

Discipline

Databases and Information Systems | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

IEEE Transactions on Software Engineering

Volume

Issue

First Page

495

Last Page

525

ISSN

0098-5589

Identifier

10.1109/TSE.2018.2864217

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Citation

FAN, Yuanrui; XIA, Xin; LO, David; and HASSAN, Ahmed E.. Chaff from the wheat: Characterizing and determining valid bug reports. (2020). IEEE Transactions on Software Engineering. 46, (5), 495-525.
Available at: https://ink.library.smu.edu.sg/sis_research/4103

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TSE.2018.2864217

Download

Find it in your library

Included in

Databases and Information Systems Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Chaff from the wheat: Characterizing and determining valid bug reports

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Chaff from the wheat: Characterizing and determining valid bug reports

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links