Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2025

Abstract

In the software development process, formal program specifications play a crucial role in various stages, including requirement analysis, software testing, and verification. However, manually crafting formal program specifications is rather difficult, making the job time-consuming and labor-intensive. Moreover, it is even more challenging to write specifications that correctly and comprehensively describe the semantics of complex programs. To reduce the burden on software developers, automated specification generation methods have emerged. However, existing methods usually rely on predefined templates or grammar, making them struggle to accurately describe the behavior and functionality of complex real-world programs. To tackle this challenge, we introduce SpecGen, a novel technique for formal program specification generation based on Large Language Models (LLMs). Our key insight is to overcome the limitations of existing methods by leveraging the code comprehension capability of LLMs. The process of SpecGen consists of two phases. The first phase employs a conversational approach that guides the LLM in generating appropriate specifications for a given program, aiming to utilize the ability of LLM to generate high-quality specifications. The second phase, designed for where the LLM fails to generate correct specifications, applies four mutation operators to the model-generated specifications and selects verifiable specifications from the mutated ones through a novel heuristic selection strategy by assigning different weights of variants in an efficient manner. We evaluate SpecGen on two datasets, including the SV-COMP Java category benchmark and a manually constructed dataset containing 120 programs. Experimental results demonstrate that SpecGen succeeds in generating verifiable specifications for 279 out of 385 programs, outperforming the existing LLM-based approaches and conventional specification generation tools like Houdini and Daikon. Further investigations on the quality of generated specifications indicate that SpecGen can comprehensively articulate the behaviors of the input program.

Keywords

program verification, specification inference, large language model

Discipline

Programming Languages and Compilers | Software Engineering

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the ICSE 2025 47th International Conference on Software Engineering, Ontario, Canada, April 27 - May 3

First Page

Last Page

Identifier

10.1109/ICSE55347.2025.00129

Publisher

IEEE

City or Country

Los Alamitos, CA

Citation

MA, Lezhi; LIU, Shangqing; LI, Yi; XIE, Xiaofei; and BU, Lei. SpecGen: Automated generation of formal program specifications via large language models. (2025). Proceedings of the ICSE 2025 47th International Conference on Software Engineering, Ontario, Canada, April 27 - May 3. 16-28.
Available at: https://ink.library.smu.edu.sg/sis_research/10329

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/ICSE55347.2025.00129

Download

Included in

Programming Languages and Compilers Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

SpecGen: Automated generation of formal program specifications via large language models

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

SpecGen: Automated generation of formal program specifications via large language models

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links