Research Collection School Of Computing and Information Systems

Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2012

Abstract

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy.

Keywords

Duplicate Bug Reports, Topic Model, Information Retrieval

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ASE 2012: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 3-7 September, Essen, Germany

First Page

Last Page

ISBN

9781450312042

Identifier

10.1145/2351676.2351687

Publisher

ACM

City or Country

New York

Citation

NGUYEN, Anh Tuan; NGUYEN, Tung; NGUYEN, Tien; LO, David; and SUN, Chengnian. Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling. (2012). ASE 2012: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 3-7 September, Essen, Germany. 70-79.
Available at: https://ink.library.smu.edu.sg/sis_research/1571

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Comments

Won ACM SIGSOFT Distinguished Paper Award

Additional URL

https://doi.org/10.1145/2351676.2351687

Download

Find it in your library

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Comments

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Comments

Additional URL

Included in

Share

Search

Links

Browse

Links