Research Collection School Of Computing and Information Systems

Duplicate bug report detection: How far are we?

Ting ZHANG, Singapore Management UniversityFollow
DongGyun HAN, Singapore Management UniversityFollow
Venkatesh VINAYAKARAO
Ivana Clairine IRSAN, Singapore Management UniversityFollow
Bowen XU, Singapore Management UniversityFollow
Thung Ferdian, Singapore Management UniversityFollow
David LO, Singapore Management UniversityFollow
Lingxiao JIANG, Singapore Management UniversityFollow

Publication Type

Journal Article

Version

publishedVersion

Publication Date

7-2023

Abstract

Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.

Keywords

Bug Reports, Duplicate Bug Report Detection, Deep Learning, Empirical Study

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ACM Transactions on Software Engineering and Methodology

Volume

Issue

First Page

Last Page

ISSN

1049-331X

Identifier

10.1145/3576042

Publisher

ACM

Citation

ZHANG, Ting; HAN, DongGyun; VINAYAKARAO, Venkatesh; IRSAN, Ivana Clairine; XU, Bowen; Ferdian, Thung; LO, David; and JIANG, Lingxiao. Duplicate bug report detection: How far are we?. (2023). ACM Transactions on Software Engineering and Methodology. 32, (4), 1-32.
Available at: https://ink.library.smu.edu.sg/sis_research/7788

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Additional URL

https://doi.org/10.1145/3576042

Download

Find it in your library

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Duplicate bug report detection: How far are we?

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Duplicate bug report detection: How far are we?

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links