Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2025

Abstract

Recent advancements in large language models (LLMs) have significantly improved code generation, which generates code snippets automatically based on natural language requirements. Despite achieving state-of-the-art performance, LLMs often struggle to generate accurate and reliable code, requiring developers to spend substantial effort debugging and evaluating the generated output. Researchers have proposed leveraging Consistency to select code that passes more tests (inter-consistency) and demonstrates consistent behavior across more counterparts (intra-consistency). However, since the tests themselves are also generated by LLMs, relying on majority voting based on incorrect tests leads to unreliable results. To address this, we propose a lightweight interaction framework that incorporates user feedback to effectively guide consistency. Our results demonstrate that, with minimal human effort, performance can be significantly improved. In each iteration, we introduce a rank-correct-fix co-evolution process between code and tests. This process iteratively enhances the quality of both, making the consistency voting between code and tests more reliable. We evaluate ConTested through extensive experiments, demonstrating its effectiveness across multiple LLMs, including GPT-3.5 and GPT-4o. Our results show improvements of 32.9% over GPT-3.5 and 16.97% over GPT-4o. Additionally, ConTested achieves an 11.1% improvement over the SOTA post-processing technique, MPSC. This improvement is achieved with only a 4-round interaction with users, requiring minimal user effort. A user study further confirms the feasibility and cost-effectiveness of ConTested, highlighting its ability to enhance code generation without introducing substantial overhead.

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

Proceedings of the ACM on Software Engineering, Volume 2, Issue ISSTA, Trondheim, Norway, 2025 June 25-28

First Page

1

Last Page

22

Identifier

10.1145/3728902

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3728902

Share

COinS