Research Collection School Of Computing and Information Systems

CodeUltraFeedback: An LLM-as-a-Judge dataset for aligning Large Language Models to coding preferences

Publication Type

Journal Article

Version

publishedVersion

Publication Date

3-2026

Abstract

Evaluating the alignment of large language models (LLMs) with user-defined coding preferences is a challenging endeavor that requires a deep assessment of LLMs' outputs. Existing methods and benchmarks rely primarily on automated metrics and static analysis tools, which often fail to capture the nuances of user instructions and LLM outputs. To address this gap, we introduce the LLM-as-a-Judge evaluation framework and present CodeUltraFeedback, a comprehensive dataset for assessing and improving LLM alignment with coding preferences. CodeUltraFeedback consists of 10,000 coding instructions, each annotated with four responses generated from a diverse pool of 14 LLMs. These responses are annotated using GPT-3.5 as a judge, with both ranking-based scores and detailed textual feedback across five distinct coding preferences. Our analysis reveals that responses from GPT-3.5 and GPT-4 are consistently rated higher than those from open-weight models, underscoring substantial alignment gaps between closed-and open-weight LLMs. In turn, we explore the usage of CodeUltraFeedback as feedback data to fine-tune and align CodeLlama-7B-Instruct using supervised fine-tuning (SFT) and reinforcement learning from AI feedback (RLAIF) with direct preference optimization (DPO). The resulting aligned model achieves an average alignment improvement of 22.7% and 29.7% when evaluated with GPT-3.5 and GPT-4 judges, respectively. Notably, our aligned CodeLlama-7B-Instruct surpasses much larger models, such as CodeLlama-13B and 34B, in alignment with coding preferences. Despite not being explicitly trained for functional correctness, it also achieves a 10.5% and 26.6% relative improvement in Pass@1 and Pass@10 on the HumanEval+ benchmark. Our contributions demonstrate the practical value of preference tuning in code generation and set the stage for further progress in model alignment and RLAIF for automated software engineering.

Keywords

Large language models, code generation, automated software engineering, reinforcement learning from AI feedback, direct preference optimization, LLM-as-a-Judge

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ACM Transactions on Software Engineering and Methodology

Volume

Issue

First Page

Last Page

ISSN

1049-331X

Identifier

10.1145/3736407

Publisher

Association for Computing Machinery (ACM)

Citation

Weyssow, Martin; Kamanda, Aton; ZHOU, Xin; and Sahraoui, Houari. CodeUltraFeedback: An LLM-as-a-Judge dataset for aligning Large Language Models to coding preferences. (2026). ACM Transactions on Software Engineering and Methodology. 35, (3), 1-36.
Available at: https://ink.library.smu.edu.sg/sis_research/11049

Copyright Owner and License

Authors-CC-BY

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3736407

Download

Included in

Artificial Intelligence and Robotics Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

CodeUltraFeedback: An LLM-as-a-Judge dataset for aligning Large Language Models to coding preferences

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

CodeUltraFeedback: An LLM-as-a-Judge dataset for aligning Large Language Models to coding preferences

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links