Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

11-2025

Abstract

Multi-agent systems (MAS) involve multiple autonomous agents that coordinate their actions to achieve shared or competing objectives in dynamic environments. Over the past decade, multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for enabling collaborative behaviors among autonomous agents within MAS to solve complex tasks. This dissertation discusses a critical scalability gap that exists between current MARL capabilities and real-world deployment requirements. Most existing MARL research focuses on small-scale laboratory problems, often struggling to coordinate large agent populations and facing challenges with extended decision-making horizons. In contrast, many real-world applications demand coordination among hundreds or thousands of agents over extended planning horizons that may span thousands of timesteps or operate as life-long systems. This disparity has created a bifurcated research landscape: general-purpose MARL algorithms are typically evaluated only on small-scale problems, leaving their scalability properties unknown, while methods that address large-scale coordination are usually domainspecific solutions (e.g., traffic management, smart grid control, autonomous vehicle fleets) with limited transferability across domains.

The fundamental question addressed in this dissertation is: How can we design multi-agent learning systems that simultaneously scale to large agent teams and extended temporal horizons while maintaining generalizability for practical deployment? In response, this dissertation establishes that hierarchical heterogeneous modular architectures — combining self-organizing neural networks, language models, and deep reinforcement learning networks — provide a unified solution to both structural scalability (increasing agent numbers) and temporal scalability (extending planning horizons) in multi-agent systems. Critically, these scalability improvements are intrinsically linked to generalizability: hierarchical decomposition enables learned policies to transfer across different team sizes and task complexities, while the integration of pre-trained modular components provides zero-shot generalization to novel scenarios without domain-specific retraining. Through rigorous theoretical analysis and extensive empirical validation, this work develops concrete solutions that enable MAS to scale to large agent teams while extending planning horizons to address long-horizon tasks. This demonstrates that true scalability inherently requires generalization capabilities across diverse environmental conditions and agent compositions.

This dissertation makes four interconnected contributions that collectively advance the field from recognizing scalability limitations to implementing practical multi-agent frameworks. First, it establishes theoretical foundations through a novel four-paradigm taxonomy that categorizes MARL methods by external control architectures and internal policy structures, revealing that approaches featuring hierarchical control with hierarchical policies (HC-HP) constitute the natural evolutionary pathway to scalability. This taxonomic framework offers a structured perspective on multi-agent system design, identifying that successful scaling requires integrating diverse computational paradigms within unified hierarchical frameworks.

Second, the Multi-Objective StarCraft Multi-agent Challenge (MOSMAC) benchmark establishes the first comprehensive evaluation framework for long-horizon, multi-objective MARL. MOSMAC provides important insights into existing MARL methods and the coordination complexities they entail. It demonstrates that current MARL algorithms fall short in long-horizon multi-objective scenarios and that independent learning can outperform centralized methods in certain situations.

Third, the dissertation presents the HiSOMA framework, which addresses structural scalability by integrating self-organizing neural networks with MARL policy networks. HiSOMA achieves significant performance improvements over traditional flat multi-agent architectures on complex coordination tasks. The framework’s three level hierarchy effectively addresses the curse of dimensionality by decomposing large-scale coordination problems into manageable mini-MAS modules while maintaining interpretability through self-organizing neural networks, a critical requirement for real-world deployment.

Fourth, the L2M2 framework addresses temporal scalability through the novel integration of large language models with MARL. L2M2 achieves performance comparable to baseline methods while requiring significantly fewer training samples, demonstrating that language models can effectively enhance MAS coordination through zero-shot planning. This significant improvement in sample efficiency directly addresses one of the primary barriers to real-world MARL deployment: the prohibitive computational cost of training large-scale multi-agent systems.

The hierarchical principle serves as the unifying thread across the framework contributions in this dissertation. HiSOMA demonstrates spatial hierarchical control by organizing agents into multi-level autonomous modules, while L2M2 implements temporal hierarchical decomposition by leveraging language models for high-level planning and task decomposition. Both frameworks maintain interpretability in hierarchical decision-making — HiSOMA through cognitive codes and L2M2 through natural language reasoning — addressing a critical requirement for real-world deployment.

This dissertation arrives at a defining moment in multi-agent systems research, a critical juncture where language models reveal their potential as agent policies. It contributes to the foundations for large-scale systems that can dynamically adapt their organizational structures, leverage diverse agent policies with heterogeneous learning algorithms, and operate reliably in long-horizon complex environments, thus bridging the gap between theoretical MARL research and real-world multi-agent system requirements.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics

Supervisor(s)

TAN, Ah Hwee

First Page

1

Last Page

198

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Available for download on Friday, September 04, 2026

Share

COinS