"Elevating automated software maintenance tasks with large language mod" by Xin ZHOU

Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

11-2024

Abstract

Software engineering involves many tasks across different phases such as requirements, design, implementation, testing, and maintenance. Among them, software maintenance is a crucial phase, typically accounting for more than half of the software life cycle's duration.
To boost developer productivity, in recent years, numerous research endeavors in software engineering have sought to automate certain software maintenance tasks through the application of machine learning techniques.
Since 2020, the emergence of advanced Large Language Models (LLMs) of code has opened new avenues for enhancing automated solutions in software maintenance.
This dissertation presents a series of works aimed at advancing automated solutions for software maintenance by leveraging and enhancing LLMs. Software maintenance includes a wide range of activities essential for ensuring software quality throughout its evolution. Specifically, this dissertation addresses four key research problems related to software maintenance.

In the first study of this dissertation, we identify a specific limitation of a leading LLM pre-trained on code snippets and documentation: it struggles to generalize to code changes. We then propose a novel LLM specifically designed for code changes, namely CCBERT, which captures fine-grained changes at the token level.
In the second study, we present VulMaster, a framework designed to enhance the effectiveness of LLMs in vulnerability repair. VulMaster addresses the input length limitations of LLMs and integrates both expert knowledge from an expert system and the structural information present in the vulnerable code.
In the third study, we present LLM4PatchCorrect, an LLM-based approach for assessing patch correctness.
LM4PatchCorrect is designed to accurately assess the correctness of patches produced by new or unseen Automated Program Repair (APR) tools.
In the fourth study, we investigate the impact of long-tailed distributions on the performance of popular LLMs, especially in software maintenance tasks like automatic code review and vulnerability type prediction. Our findings indicate that LLMs face challenges in achieving good performance on the tail data.

In conclusion, this dissertation contributes to the Software Engineering field by demonstrating the potential of large language models (LLMs) to enhance software maintenance tasks.

Degree Awarded

PhD in Computer Science

Discipline

Programming Languages and Compilers | Software Engineering

Supervisor(s)

LO, David

First Page

1

Last Page

255

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS