Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2025
Abstract
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL), particularly in multi-agent RL (MARL). The large joint state-action spaces and intricate inter-agent interactions in MARL make inferring the joint reward function especially challenging. While prior studies in single-agent settings have explored ways to recover reward functions and expert policies from human preference feedback, such studies in MARL remain limited. Existing methods typically combine two separate stages, supervised reward learning, and standard MARL algorithms, leading to unstable training processes. In this work, we exploit the inherent connection between reward functions and Q functions in cooperative MARL to introduce a novel end-to-end preference-based learning framework. Our framework is supported by a carefully designed multi-agent value decomposition strategy that enhances training efficiency. Extensive experiments on two state-of-the-art benchmarks, SMAC and MAMuJoCo, using preference data generated by both rule-based and large language model approaches demonstrate that our algorithm consistently outperforms existing methods across various tasks.
Keywords
Multi-agent Reinforcement Learning, Preference Learning
Discipline
Artificial Intelligence and Robotics
Areas of Excellence
Digital transformation
Publication
Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), Vancouver, Canada, July 13-19
First Page
1
Last Page
31
City or Country
Vancouver, Canada
Citation
BUI, The Viet; MAI, Tien; and NGUYEN, Hong Thanh.
O-MAPL: Offline multi-agent preference learning. (2025). Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), Vancouver, Canada, July 13-19. 1-31.
Available at: https://ink.library.smu.edu.sg/sis_research/10708
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://openreview.net/pdf?id=FYvrNKYu6H