Research Collection School Of Computing and Information Systems

A knowledge enhanced Large Language Model for bug localization

Publication Type

Journal Article

Version

publishedVersion

Publication Date

6-2025

Abstract

A significant number of bug reports are generated every day as software systems continue to develop. Large Language Models (LLMs) have been used to correlate bug reports with source code to locate bugs automatically. The existing research has shown that LLMs are effective for bug localization and can increase software development efficiency. However, these studies still have two limitations. First, these models fail to capture context information about bug reports and source code. Second, these models are unable to understand the domain-specific expertise inherent to particular projects, such as version information in projects that are composed of alphanumeric characters without any semantic meaning.To address these challenges, we propose a Knowledge Enhanced Pre-Trained model using project documents and historical code, called KEPT, for bug localization. Project documents record, revise, and restate project information that provides rich semantic information about those projects. Historical code contains rich code semantic information that can enhance the reasoning ability of LLMs. Specifically, we construct knowledge graphs from project documents and source code. Then, we introduce knowledge graphs to the LLM through soft-position embedding and visible matrices, enhancing its contextual and professional reasoning ability. To validate our model, we conducted a series of experiments on seven open-source software projects with over 6,000 bug reports. Compared with the traditional model (Locus), KEPT performs better by 33.2% to 59.5% in terms of mean reciprocal rank, mean average precision, and Top@N. Compared with the best-performing non-commercial LLM (CodeT5), KEPT achieves an improvement of 36.6% to 63.7%. Compared to the state-of-the-art commercial LLM developed by OpenAI, called text-embedding-ada-002, KEPT achieves an average improvement of 7.8% to 17.4%. The results indicate that introducing knowledge graphs contributes to enhance the effectiveness of the LLM in bug localization.

Keywords

large language model, knowledge enhancement, bug localization, information retrieval

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the ACM on Software Engineering

Volume

Issue

FSE

First Page

1914

Last Page

1936

Identifier

10.1145/3729356

Publisher

Association for Computing Machinery

Citation

LI, Yue; LIU, Bohan; ZHANG, Ting; WANG, Zhiqi; LO, David; YANG, Lanxin; LYU, Jun; and ZHANG, He. A knowledge enhanced Large Language Model for bug localization. (2025). Proceedings of the ACM on Software Engineering. 2, (FSE), 1914-1936.
Available at: https://ink.library.smu.edu.sg/sis_research/10890

Copyright Owner and License

Authors-CC-BY

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3729356

Download

Included in

Artificial Intelligence and Robotics Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

A knowledge enhanced Large Language Model for bug localization

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

A knowledge enhanced Large Language Model for bug localization

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links