Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2017
Abstract
Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDec-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.
Keywords
Collective behavior, Environment dynamics, Multi-agent planning, Optimization problems, Reinforcement learning method, Sequential decision making, Synthetic benchmark, Value function approximation
Discipline
Artificial Intelligence and Robotics | Computer Sciences | Operations Research, Systems Engineering and Industrial Engineering
Research Areas
Intelligent Systems and Optimization
Publication
Advances in Neural Information Processing Systems: Proceedings of NIPS 2017, December 4-9, Long Beach
First Page
4320
Last Page
4330
Publisher
NIPS Foundation
City or Country
La Jolla, CA
Citation
NGUYEN, Duc Thien; KUMAR, Akshat; and LAU, Hoong Chuin.
Policy gradient with value function approximation for collective multiagent planning. (2017). Advances in Neural Information Processing Systems: Proceedings of NIPS 2017, December 4-9, Long Beach. 4320-4330.
Available at: https://ink.library.smu.edu.sg/sis_research/3871
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://papers.nips.cc/paper/7019-policy-gradient-with-value-function-approximation-for-collective-multiagent-planning.pdf
Included in
Artificial Intelligence and Robotics Commons, Operations Research, Systems Engineering and Industrial Engineering Commons