Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2025
Abstract
Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks, yet their high computational demands limit widespread adoption. While distilling large models into smaller ones offers a sustainable solution, current techniques—such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection—struggle to yield substantial and lasting performance gains. In this paper, we present a novel Debate and Reflect (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback (e.g., error analysis, corrective strategies) to guide student models. Further, we introduce Tree-structured Direct Preference Optimization (T-DPO) to efficiently leverage these debate logs, organizing interactions into a hierarchical format for effective training. Empirical evaluations across diverse NLP benchmarks demonstrate that our approach significantly improves smaller-model accuracy, robustness, and generalization, outperforming conventional baselines by a large margin.
Discipline
Artificial Intelligence and Robotics | Programming Languages and Compilers
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1
First Page
9122
Last Page
9137
Identifier
10.18653/v1/2025.findings-acl.475
Publisher
ACL
City or Country
Austria
Citation
ZHOU, Xiaofeng; HUANG, Heyan; and LIAO, Lizi.
Debate, reflect, and distill: Multi-agent feedback with tree-structured preference optimization for efficient language model enhancement. (2025). Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1. 9122-9137.
Available at: https://ink.library.smu.edu.sg/sis_research/10758
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2025.findings-acl.475
Included in
Artificial Intelligence and Robotics Commons, Programming Languages and Compilers Commons