Publication Type

Journal Article

Version

publishedVersion

Publication Date

6-2025

Abstract

A significant number of bug reports are generated every day as software systems continue to develop. Large Language Models (LLMs) have been used to correlate bug reports with source code to locate bugs automatically. The existing research has shown that LLMs are effective for bug localization and can increase software development efficiency. However, these studies still have two limitations. First, these models fail to capture context information about bug reports and source code. Second, these models are unable to understand the domain-specific expertise inherent to particular projects, such as version information in projects that are composed of alphanumeric characters without any semantic meaning.To address these challenges, we propose a Knowledge Enhanced Pre-Trained model using project documents and historical code, called KEPT, for bug localization. Project documents record, revise, and restate project information that provides rich semantic information about those projects. Historical code contains rich code semantic information that can enhance the reasoning ability of LLMs. Specifically, we construct knowledge graphs from project documents and source code. Then, we introduce knowledge graphs to the LLM through soft-position embedding and visible matrices, enhancing its contextual and professional reasoning ability. To validate our model, we conducted a series of experiments on seven open-source software projects with over 6,000 bug reports. Compared with the traditional model (Locus), KEPT performs better by 33.2% to 59.5% in terms of mean reciprocal rank, mean average precision, and Top@N. Compared with the best-performing non-commercial LLM (CodeT5), KEPT achieves an improvement of 36.6% to 63.7%. Compared to the state-of-the-art commercial LLM developed by OpenAI, called text-embedding-ada-002, KEPT achieves an average improvement of 7.8% to 17.4%. The results indicate that introducing knowledge graphs contributes to enhance the effectiveness of the LLM in bug localization.

Keywords

large language model, knowledge enhancement, bug localization, information retrieval

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the ACM on Software Engineering

Volume

2

Issue

FSE

First Page

1914

Last Page

1936

Identifier

10.1145/3729356

Publisher

Association for Computing Machinery

Copyright Owner and License

Authors-CC-BY

Creative Commons License

Creative Commons Attribution 3.0 License
This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3729356

Share

COinS