Research Collection School Of Computing and Information Systems

On the influence of biases in bug localization: evaluation and benchmark

Ratnadira WIDYASARI
Stefanus Agus HARYONO
Ferdian THUNG
Jieke SHI
Constance TAN
Fiona WEE
Jack PHAN
David LO, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

3-2022

Abstract

Bug localization is the task of identifying parts of thesource code that needs to be changed to resolve a bug report.As this task is difficult, automatic bug localization tools havebeen proposed. The development and evaluation of these toolsrely on the availability of high-quality bug report datasets. In2014, Kochhar et al. identified three biases in datasets used toevaluate bug localization techniques: (1) misclassified bug report,(2) already localized bug report, and (3) incorrect ground truthfile in a bug report. They reported that already localized bugreports statistically significantly and substantially impact buglocalization results, and thus should be removed. However, theirevaluation is still limited, as they only investigated 3 projectswritten in Java. In this study, we replicate the study of Kochharet al. on the effect of biases in bug report dataset for buglocalization. Further investigation on this topic is necessary asnew and larger bug report datasets have been proposed withoutbeing checked for these biases.We conduct our analysis on a collection of 2,913 bug reportstaken from the recently released Bugzbook dataset that fix Pythonfiles. To investigate the prevalence of the biases, we check thebias distributions. For each bias, we select and label a set of bugreports that may contain the bias and compute the proportionof bug reports in the set that exhibit the bias. We find that5%, 23%, and 30% of the bug reports that we investigated areaffected by biases 1, 2, and 3 respectively. Then, we investigatethe effect of the three biases on bug localization by measuringthe performance of IncBL, a recent bug localization tool, andthe classical Vector Space Model (VSM) based bug localizationtool, which was used in the Kochhar et al. study. Our experiment results highlight that bias 2 significantly impact the buglocalization results, while bias 1 and 3 do not have a significantimpact. We also find that the effect sizes of bias 2 to IncBL andVSM are different, where IncBL has a higher effect size thanVSM. Our findings corroborate the result reported by Kochharet al. and demonstrate that bias 2 not only affects the 3 Javaprojects investigated in their study, but also others in anotherprogramming language (i.e., Python). This highlights the need toeliminate bias 2 from the evaluation of future bug localizationtools. As a by-product of our replication study, we have releaseda benchmark dataset, which we refer to as CAPTURED, that hasbeen cleaned from the three biases. CAPTURED contains Pythonprograms and therefore augments the cleaned dataset releasedby Kochhar et al., which only contains Java programs.

Keywords

Bias, Bug localization, Bug report, Python

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems | Information Security | Programming Languages and Compilers

Research Areas

Data Science and Engineering; Cybersecurity

Publication

Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022

First Page

128

Last Page

139

ISBN

9781665437868

Identifier

10.1109/SANER53432.2022.00027

Publisher

IEEE

City or Country

Honolulu, HI, USA

Citation

WIDYASARI, Ratnadira; HARYONO, Stefanus Agus; THUNG, Ferdian; SHI, Jieke; TAN, Constance; WEE, Fiona; PHAN, Jack; and David LO. On the influence of biases in bug localization: evaluation and benchmark. (2022). Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022. 128-139.
Available at: https://ink.library.smu.edu.sg/sis_research/7655

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1109/SANER53432.2022.00027

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Information Security Commons, Programming Languages and Compilers Commons

COinS

Research Collection School Of Computing and Information Systems

On the influence of biases in bug localization: evaluation and benchmark

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

On the influence of biases in bug localization: evaluation and benchmark

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links