Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2018

Abstract

Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment. We focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their collective influence'' on each other, rather than their identities. Unlike previous work, we address a general setting where system reward is not decomposable among agents. We develop collective actor-critic RL approaches for this setting, and address the problem of multiagent credit assignment, and computing low variance policy gradient estimates that result in faster convergence to high quality solutions. We also develop difference rewards based credit assignment methods for the collective setting. Empirically our new approaches provide significantly better solutions than previous methods in the presence of global rewards on two real world problems modeling taxi fleet optimization and multiagent patrolling, and a synthetic grid navigation domain.

Keywords

Credit assignment methods, Decision-theoretic planning, Faster convergence, High-quality solutions, Multi-agent patrolling, Multi-agent planning, Partial observability, Real-world problem

Discipline

Artificial Intelligence and Robotics | Operations Research, Systems Engineering and Industrial Engineering

Research Areas

Intelligent Systems and Optimization

Publication

Advances in Neural Information Processing Systems (NIPS 2018): Montreal, Canada, December 2-8

First Page

8102

Last Page

8113

ISSN

1049-5258

Publisher

MIT Press

City or Country

Cambridge

Copyright Owner and License

Authors

Additional URL

https://papers.nips.cc/paper/8033-credit-assignment-for-collective-multiagent-rl-with-global-rewards

Share

COinS