Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2025
Abstract
Recent advancements in large language models (LLMs) have significantly improved code generation, which generates code snippets automatically based on natural language requirements. Despite achieving state-of-the-art performance, LLMs often struggle to generate accurate and reliable code, requiring developers to spend substantial effort debugging and evaluating the generated output. Researchers have proposed leveraging Consistency to select code that passes more tests (inter-consistency) and demonstrates consistent behavior across more counterparts (intra-consistency). However, since the tests themselves are also generated by LLMs, relying on majority voting based on incorrect tests leads to unreliable results. To address this, we propose a lightweight interaction framework that incorporates user feedback to effectively guide consistency. Our results demonstrate that, with minimal human effort, performance can be significantly improved. In each iteration, we introduce a rank-correct-fix co-evolution process between code and tests. This process iteratively enhances the quality of both, making the consistency voting between code and tests more reliable. We evaluate ConTested through extensive experiments, demonstrating its effectiveness across multiple LLMs, including GPT-3.5 and GPT-4o. Our results show improvements of 32.9% over GPT-3.5 and 16.97% over GPT-4o. Additionally, ConTested achieves an 11.1% improvement over the SOTA post-processing technique, MPSC. This improvement is achieved with only a 4-round interaction with users, requiring minimal user effort. A user study further confirms the feasibility and cost-effectiveness of ConTested, highlighting its ability to enhance code generation without introducing substantial overhead.
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Areas of Excellence
Digital transformation
Publication
Proceedings of the ACM on Software Engineering, Volume 2, Issue ISSTA, Trondheim, Norway, 2025 June 25-28
First Page
1
Last Page
22
Identifier
10.1145/3728902
Publisher
ACM
City or Country
New York
Citation
DONG, Jinhao; SUN, Jun; ZHANG, Wenjie; DONG, Jinsong; and HAO, Dan.
ConTested: Consistency-aided tested code generation with LLM. (2025). Proceedings of the ACM on Software Engineering, Volume 2, Issue ISSTA, Trondheim, Norway, 2025 June 25-28. 1-22.
Available at: https://ink.library.smu.edu.sg/sis_research/10284
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3728902