Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

7-2022

Abstract

Intelligent agents are becoming increasingly prevalent in a wide variety of domains including but not limited to transportation, safety and security. To better utilize the intelligence, there has been increasing focus on frameworks and methods for coordinating these intelligent agents. This thesis is specifically targeted at providing solution approaches for improving large scale multi-agent systems with selfish intelligent agents. In such systems, the performance of an agent depends on not just his/her own efforts, but also on other agent’s decisions. The complexity of interactions among multiple agents, coupled with the large scale nature of the problem domains and the uncertainties associated with the environment, make decision making very challenging. In this work, we specifically study the problem from the perspective of a centralized aggregator, that needs to maximize the revenue of the entire system.

To that end, we study this problem from strategic and operational point of view. With regards to strategic decision making, we propose planning and deep reinforcement learning based solution algorithms to improve the system performance by optimizing the adaptive operating hours of selfish agents and by providing flexible work schedules to them. From operational point of view, we propose novel mechanism to incentivise selfish agents, so that performance of all the agents and the overall system improve . Basically, through strategic and operational decision making, we assist selfish agents in making intelligent decisions that results in improved system performance.

In the first part of this thesis, we focus on making strategic decisions for the workers in the digital gig economy. To provide a concrete context, we focus on taxi drivers in the transport gig economy. Taxi fleets and car aggregation systems are an important component of the urban public transportation system. Taxis and cars in taxi fleets and car aggregation systems (e.g., Uber) are dependent on a large number of self-controlled and profitdriven taxi drivers, which introduces inefficiencies in the system. There are two ways in which taxi fleet performance can be optimized: (i) Operational decision making: improve assignment of taxis/cars to customers, while accounting for future demand; (ii) strategic decision making: optimize operating hours of (taxi and car) drivers. Existing research has primarily focused on the operational decisions in (i) and we focus on the strategic decisions in (ii).

We first model this complex real world decision making problem (with thousands of taxi drivers) as a multi-stage stochastic congestion game with a non dedicated set of agents (i.e., agents start operation at a random stage and exit the game after a fixed time), where there is a dynamic population of agents (constrained by the maximum number of drivers). We provide planning and learning methods for computing the ideal operating hours in such a game, so as to improve efficiency of the overall fleet. In our experimental results, we demonstrate that our planning based approach provides up to 16% improvement in revenue over existing method on a real world taxi dataset. The learning based approach further improves the performance and achieves up to 10% more revenue than the planning approach.

In second part of this thesis, We focus on: a) addressing the problem of handling schedule constraints of individual agents (e.g., breaks during work hours) to provide a flexible work schedule for them; and b) provide a scalable solution approach in such large scale problem settings. We introduced a simulation based (faster) equilibrium computation method that relies on policy imputation. We studied and analyzed different imputation methods and show that a good imputation method coupled with a well designed simulation based best response computation can help in achieving better symmetric equilibrium for large scale systems, in a time efficient manner. We demonstrate that our methods provide significantly better policies than the previous approach in terms of improving individual agent revenue and overall agent availability.

In the third/final part of the thesis, we focus of operational decision making, where we improve system performance by inducing cooperation among selfish agents. Here we focus on principal-agent problem setting. Principalagent relationships, where a principal employs several agents to accomplish tasks on its behalf, are prevalent in many domains (e.g., Manufacturer distributors for product distribution, Uber-taxi drivers for transportation, FoodPanda-delivery personnel for food delivery). Principal has a global observation on all the tasks, while agents only have local observations with regards to local tasks. This limited observability coupled with selfish interest of agents results in a misalignment between Principal and agents objectives. We provide Multi-Agent Reinforcement Learning (MARL) approaches for sequentially designing incentives that improves objectives for principal and agents. We demonstrate that our approaches are able to outperform the state of art approaches for sequential incentive design on Escape-Room and adapted StarCraft-2 environments.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Supervisor(s)

VARAKANTHAM, Pradeep Reddy

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS