Research Collection School Of Computing and Information Systems

Learning and exploiting shaped reward models for large scale multiagent RL

Arambam James SINGH, Singapore Management UniversityFollow
Akshat KUMAR, Singapore Management UniversityFollow
Hoong Chuin LAU, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2021

Abstract

Many real world systems involve interaction among large number of agents to achieve a common goal, for example, air traffic control. Several model-free RL algorithms have been proposed for such settings. A key limitation is that the empirical reward signal in model-free case is not very effective in addressing the multiagent credit assignment problem, which determines an agent's contribution to the team's success. This results in lower solution quality and high sample complexity. To address this, we contribute (a) an approach to learn a differentiable reward model for both continuous and discrete action setting by exploiting the collective nature of interactions among agents, a feature commonly present in large scale multiagent applications; (b) a shaped reward model analytically derived from the learned reward model to address the key challenge of credit assignment; (c) a model-based multiagent RL approach that integrates shaped rewards into well known RL algorithms such as policy gradient, soft-actor critic. Compared to previous methods, our learned reward models are more accurate, and our approaches achieve better solution quality on synthetic and real world instances of air traffic control, and cooperative navigation with large agent population.

Keywords

Model Representation And Learning Domain Models For Planning, Multi-agent Planning And Learning

Discipline

Artificial Intelligence and Robotics | Operations Research, Systems Engineering and Industrial Engineering

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling 2021: August 2-13, Guangzhou

First Page

588

Last Page

596

Publisher

AAAI Press

City or Country

Menlo Park, CA

Embargo Period

7-9-2021

Citation

SINGH, Arambam James; KUMAR, Akshat; and LAU, Hoong Chuin. Learning and exploiting shaped reward models for large scale multiagent RL. (2021). Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling 2021: August 2-13, Guangzhou. 588-596.
Available at: https://ink.library.smu.edu.sg/sis_research/6032

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://ojs.aaai.org/index.php/ICAPS/article/view/16007

Download

Included in

Artificial Intelligence and Robotics Commons, Operations Research, Systems Engineering and Industrial Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Learning and exploiting shaped reward models for large scale multiagent RL

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

City or Country

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Learning and exploiting shaped reward models for large scale multiagent RL

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

City or Country

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links