Publication Type
PhD Dissertation
Version
publishedVersion
Publication Date
8-2021
Abstract
In the current age, rapid growth in sectors like finance, transportation etc., involve fast digitization of industrial processes. This creates a huge opportunity for next-generation artificial intelligence system with multiple agents operating at scale. Multiagent reinforcement learning (MARL) is the field of study that addresses problems in the multiagent systems. In this thesis, we develop and evaluate novel MARL methodologies that address the challenges in large scale multiagent system with cooperative setting. One of the key challenge in cooperative MARL is the problem of credit assignment. Many of the previous approaches to the problem relies on agent's individual trajectory which makes scalability limited to small number of agents. Our proposed methodologies are solely based on aggregate information which provides the benefit of high scalability. The dimension of key statistics does not change with increasing agent population size. In this thesis we also address other challenges that arise in MARL such as variable duration action, and also some preliminary work on credit assignment with sparse reward model.
The first part of this thesis investigates the challenges in a maritime traffic management (MTM) problem, one of the motivating domains for large scale cooperative multiagent systems. The key research question is how to coordinate vessels in a heavily trafficked maritime traffic environment to increase the safety of navigation by reducing traffic congestions. MTM problem is an instance of cooperative MARL with shared reward. Vessels share the same penalty cost for any congestions. Thus, it suffer from the credit assignment problem. We address it by developing a vessel-based value function using aggregate information, which performs effective credit assignment by computing the effectiveness of the agent’s policy by filtering out the contributions from other agents. Although this first approach achieved promising results, its ability to handle variable duration action is rather limited, which is a crucial feature of the problem domain. Thus, we address this challenge using hierarchical reinforcement learning, a framework for control with variable duration action. We develop a novel hierarchical learning based approach for the maritime traffic control problem. We introduce a notion of meta action a high level action that takes variable amount time to execute. We also propose an individual meta value function using aggregate information which effectively address the credit assignment problem.
We also develop a general approach to address the credit assignment problem for a large scale cooperative multiagent system for both discrete and continuous actions settings. We extended a shaped reward approach known as difference rewards (DR) to address the credit assignment problem. DRs are an effective tool to tackle this problem, but their computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on the aggregate information. One limitation of this DR based approach for credit assignment is that it relies on learning a good approximation of reward model. But, in a sparse reward setting agents do not receive any informative immediate reward signal until the episode ends, so this shaped reward based approach is not effective in sparse reward case. In this thesis, we also propose some preliminary work in this direction.
Keywords
Multiagent Reinforcement Learning
Degree Awarded
PhD in Information Systems
Discipline
Artificial Intelligence and Robotics | Operations Research, Systems Engineering and Industrial Engineering
Supervisor(s)
LAU, Hoong Chuin; KUMAR, Akshat
First Page
1
Last Page
197
Publisher
Singapore Management University
City or Country
Singapore
Citation
SINGH, Arambam James.
Credit assignment in multiagent reinforcement learning for large agent population. (2021). 1-197.
Available at: https://ink.library.smu.edu.sg/etd_coll/364
Copyright Owner and License
Author
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Operations Research, Systems Engineering and Industrial Engineering Commons