Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2021

Abstract

Recent work in multi-agent reinforcement learning (MARL) by [Zhang, ICML12018] provided the first decentralized actor-critic algorithm to offer convergence guarantees. In that work, policies are stochastic and are defined on finite action spaces. We extend those results to develop a provably-convergent decentralized actor-critic algorithm for learning deterministic policies on continuous action spaces. Deterministic policies are important in many real-world settings. To handle the lack of exploration inherent in deterministic policies we provide results for the off-policy setting as well as the on-policy setting. We provide the main ingredients needed for this problem: the expression of a local deterministic policy gradient, a decentralized deterministic actor-critic algorithm, and convergence guarantees when the value functions are approximated linearly. This work enables decentralized MARL in high-dimensional action spaces and paves the way for more widespread application of MARL.

Discipline

Numerical Analysis and Scientific Computing

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 60th IEEE Conference on Decision and Control, CDC 2021, Austin, TX, December 14-17

First Page

1548

Last Page

1553

ISBN

9781665436595

Identifier

10.1109/CDC45484.2021.9683356

Publisher

IEEE

City or Country

Piscataway, NJ

Additional URL

https://doi.org/10.1109/CDC45484.2021.9683356

Share

COinS