Publication Type

Journal Article

Version

acceptedVersion

Publication Date

11-2024

Abstract

Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a Systematic Literature Review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We selected and analyzed 395 research articles from January 2017 to January 2024 to answer four key Research Questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, pre-processing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and highlighting promising areas for future study. Our artifacts are publicly available at https://github.com/security-pride/LLM4SE_SLR.

Keywords

Software Engineering, Large Language Model, Survey

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ACM Transactions on Software Engineering and Methodology

Volume

33

Issue

8

First Page

1

Last Page

79

ISSN

1049-331X

Identifier

10.1145/3695988

Publisher

Association for Computing Machinery (ACM)

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1145/36959

Share

COinS