Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

12-2023

Abstract

In recent years, software engineering (SE) has witnessed significant growth, leading to the creation and sharing of an abundance of software artifacts such as source code, bug reports, and pull requests. Analyzing these artifacts is crucial for comprehending the sentiments of software developers and automating various SE tasks, ultimately leading to more human-centered automated SE and enhancing software development efficiency. However, the diverse and unstructured nature of software text poses a significant challenge to this analysis. In response, researchers have investigated a variety of approaches, including the utilization of natural language processing techniques. The advent of large language models (LLMs), ranging from smaller-size LLMs (sLLMs) like BERT to bigger ones (bLLMs) such as LLaMA, has ignited a growing interest in their potential for analyzing software-related text.

This dissertation explores how LLMs can automate different SE tasks involving classification, ranking, and generation tasks. In the first study, we assess the efficacy of sLLMs, such as BERT, in SE sentiment analysis, comparing them to existing SE-specific tools. Furthermore, we compare the performance of bLLMs with sLLMs in this context. In the second study, we address the issue of retrieving duplicate bug reports. First, we create a benchmark and then use bLLMs to enhance the accuracy of this process, with a specific focus on employing GPT-3.5 for suggesting duplicate bug reports. In the third study, we propose to leverage sLLMs to create precise and concise pull request titles.

In conclusion, this dissertation contributes to the SE field by exploring the potential of LLMs to support software developers in understanding sentiments and improving the efficiency of software development.

Keywords

large language models, sentiment analysis, software engineering, duplicate bug reports, pull request

Degree Awarded

PhD in Computer Science

Discipline

Programming Languages and Compilers | Software Engineering

Supervisor(s)

LO, David; JIANG, Lingxiao

First Page

Last Page

184

Publisher

Singapore Management University

City or Country

Singapore

Citation

ZHANG, Ting. Supporting software engineers with large language model-based automation. (2023). 1-184.
Available at: https://ink.library.smu.edu.sg/etd_coll/545

Copyright Owner and License

Author

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Programming Languages and Compilers Commons, Software Engineering Commons

COinS

Dissertations and Theses Collection (Open Access)

Supporting software engineers with large language model-based automation

Publication Type

Version

Publication Date

Abstract

Keywords

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Search

Links

Browse

Links

Dissertations and Theses Collection (Open Access)

Supporting software engineers with large language model-based automation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Share

Search

Links

Browse

Links