Publication Type
Journal Article
Version
publishedVersion
Publication Date
6-2025
Abstract
A significant number of bug reports are generated every day as software systems continue to develop. Large Language Models (LLMs) have been used to correlate bug reports with source code to locate bugs automatically. The existing research has shown that LLMs are effective for bug localization and can increase software development efficiency. However, these studies still have two limitations. First, these models fail to capture context information about bug reports and source code. Second, these models are unable to understand the domain-specific expertise inherent to particular projects, such as version information in projects that are composed of alphanumeric characters without any semantic meaning.To address these challenges, we propose a Knowledge Enhanced Pre-Trained model using project documents and historical code, called KEPT, for bug localization. Project documents record, revise, and restate project information that provides rich semantic information about those projects. Historical code contains rich code semantic information that can enhance the reasoning ability of LLMs. Specifically, we construct knowledge graphs from project documents and source code. Then, we introduce knowledge graphs to the LLM through soft-position embedding and visible matrices, enhancing its contextual and professional reasoning ability. To validate our model, we conducted a series of experiments on seven open-source software projects with over 6,000 bug reports. Compared with the traditional model (Locus), KEPT performs better by 33.2% to 59.5% in terms of mean reciprocal rank, mean average precision, and Top@N. Compared with the best-performing non-commercial LLM (CodeT5), KEPT achieves an improvement of 36.6% to 63.7%. Compared to the state-of-the-art commercial LLM developed by OpenAI, called text-embedding-ada-002, KEPT achieves an average improvement of 7.8% to 17.4%. The results indicate that introducing knowledge graphs contributes to enhance the effectiveness of the LLM in bug localization.
Keywords
large language model, knowledge enhancement, bug localization, information retrieval
Discipline
Artificial Intelligence and Robotics | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the ACM on Software Engineering
Volume
2
Issue
FSE
First Page
1914
Last Page
1936
Identifier
10.1145/3729356
Publisher
Association for Computing Machinery
Citation
LI, Yue; LIU, Bohan; ZHANG, Ting; WANG, Zhiqi; LO, David; YANG, Lanxin; LYU, Jun; and ZHANG, He.
A knowledge enhanced Large Language Model for bug localization. (2025). Proceedings of the ACM on Software Engineering. 2, (FSE), 1914-1936.
Available at: https://ink.library.smu.edu.sg/sis_research/10890
Copyright Owner and License
Authors-CC-BY
Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.
Additional URL
https://doi.org/10.1145/3729356