Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2025

Abstract

Recent advancements in large language models (LLMs) have significantly improved code generation, which generates code snippets automatically based on natural language requirements. Despite achieving state-of-the-art performance, LLMs often struggle to generate accurate and reliable code, requiring developers to spend substantial effort debugging and evaluating the generated output. Researchers have proposed leveraging Consistency to select code that passes more tests (inter-consistency) and demonstrates consistent behavior across more counterparts (intra-consistency). However, since the tests themselves are also generated by LLMs, relying on majority voting based on incorrect tests leads to unreliable results. To address this, we propose a lightweight interaction framework that incorporates user feedback to effectively guide consistency. Our results demonstrate that, with minimal human effort, performance can be significantly improved. In each iteration, we introduce a rank-correct-fix co-evolution process between code and tests. This process iteratively enhances the quality of both, making the consistency voting between code and tests more reliable. We evaluate ConTested through extensive experiments, demonstrating its effectiveness across multiple LLMs, including GPT-3.5 and GPT-4o. Our results show improvements of 32.9% over GPT-3.5 and 16.97% over GPT-4o. Additionally, ConTested achieves an 11.1% improvement over the SOTA post-processing technique, MPSC. This improvement is achieved with only a 4-round interaction with users, requiring minimal user effort. A user study further confirms the feasibility and cost-effectiveness of ConTested, highlighting its ability to enhance code generation without introducing substantial overhead.

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

Proceedings of the ACM on Software Engineering, Volume 2, Issue ISSTA, Trondheim, Norway, 2025 June 25-28

First Page

Last Page

Identifier

10.1145/3728902

Publisher

ACM

City or Country

New York

Citation

DONG, Jinhao; SUN, Jun; ZHANG, Wenjie; DONG, Jinsong; and HAO, Dan. ConTested: Consistency-aided tested code generation with LLM. (2025). Proceedings of the ACM on Software Engineering, Volume 2, Issue ISSTA, Trondheim, Norway, 2025 June 25-28. 1-22.
Available at: https://ink.library.smu.edu.sg/sis_research/10284

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3728902

Download

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

ConTested: Consistency-aided tested code generation with LLM

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

ConTested: Consistency-aided tested code generation with LLM

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links