Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

11-2025

Abstract

Multi-agent systems (MAS) involve multiple autonomous agents that coordinate their actions to achieve shared or competing objectives in dynamic environments. Over the past decade, multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for enabling collaborative behaviors among autonomous agents within MAS to solve complex tasks. This dissertation discusses a critical scalability gap that exists between current MARL capabilities and real-world deployment requirements. Most existing MARL research focuses on small-scale laboratory problems, often struggling to coordinate large agent populations and facing challenges with extended decision-making horizons. In contrast, many real-world applications demand coordination among hundreds or thousands of agents over extended planning horizons that may span thousands of timesteps or operate as life-long systems. This disparity has created a bifurcated research landscape: general-purpose MARL algorithms are typically evaluated only on small-scale problems, leaving their scalability properties unknown, while methods that address large-scale coordination are usually domainspecific solutions (e.g., traffic management, smart grid control, autonomous vehicle fleets) with limited transferability across domains.

The fundamental question addressed in this dissertation is: How can we design multi-agent learning systems that simultaneously scale to large agent teams and extended temporal horizons while maintaining generalizability for practical deployment? In response, this dissertation establishes that hierarchical heterogeneous modular architectures — combining self-organizing neural networks, language models, and deep reinforcement learning networks — provide a unified solution to both structural scalability (increasing agent numbers) and temporal scalability (extending planning horizons) in multi-agent systems. Critically, these scalability improvements are intrinsically linked to generalizability: hierarchical decomposition enables learned policies to transfer across different team sizes and task complexities, while the integration of pre-trained modular components provides zero-shot generalization to novel scenarios without domain-specific retraining. Through rigorous theoretical analysis and extensive empirical validation, this work develops concrete solutions that enable MAS to scale to large agent teams while extending planning horizons to address long-horizon tasks. This demonstrates that true scalability inherently requires generalization capabilities across diverse environmental conditions and agent compositions.

This dissertation makes four interconnected contributions that collectively advance the field from recognizing scalability limitations to implementing practical multi-agent frameworks. First, it establishes theoretical foundations through a novel four-paradigm taxonomy that categorizes MARL methods by external control architectures and internal policy structures, revealing that approaches featuring hierarchical control with hierarchical policies (HC-HP) constitute the natural evolutionary pathway to scalability. This taxonomic framework offers a structured perspective on multi-agent system design, identifying that successful scaling requires integrating diverse computational paradigms within unified hierarchical frameworks.

Second, the Multi-Objective StarCraft Multi-agent Challenge (MOSMAC) benchmark establishes the first comprehensive evaluation framework for long-horizon, multi-objective MARL. MOSMAC provides important insights into existing MARL methods and the coordination complexities they entail. It demonstrates that current MARL algorithms fall short in long-horizon multi-objective scenarios and that independent learning can outperform centralized methods in certain situations.

Third, the dissertation presents the HiSOMA framework, which addresses structural scalability by integrating self-organizing neural networks with MARL policy networks. HiSOMA achieves significant performance improvements over traditional flat multi-agent architectures on complex coordination tasks. The framework’s three level hierarchy effectively addresses the curse of dimensionality by decomposing large-scale coordination problems into manageable mini-MAS modules while maintaining interpretability through self-organizing neural networks, a critical requirement for real-world deployment.

Fourth, the L2M2 framework addresses temporal scalability through the novel integration of large language models with MARL. L2M2 achieves performance comparable to baseline methods while requiring significantly fewer training samples, demonstrating that language models can effectively enhance MAS coordination through zero-shot planning. This significant improvement in sample efficiency directly addresses one of the primary barriers to real-world MARL deployment: the prohibitive computational cost of training large-scale multi-agent systems.

The hierarchical principle serves as the unifying thread across the framework contributions in this dissertation. HiSOMA demonstrates spatial hierarchical control by organizing agents into multi-level autonomous modules, while L2M2 implements temporal hierarchical decomposition by leveraging language models for high-level planning and task decomposition. Both frameworks maintain interpretability in hierarchical decision-making — HiSOMA through cognitive codes and L2M2 through natural language reasoning — addressing a critical requirement for real-world deployment.

This dissertation arrives at a defining moment in multi-agent systems research, a critical juncture where language models reveal their potential as agent policies. It contributes to the foundations for large-scale systems that can dynamically adapt their organizational structures, leverage diverse agent policies with heterogeneous learning algorithms, and operate reliably in long-horizon complex environments, thus bridging the gap between theoretical MARL research and real-world multi-agent system requirements.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics

Supervisor(s)

TAN, Ah Hwee

First Page

Last Page

198

Publisher

Singapore Management University

City or Country

Singapore

Citation

GENG, Minghong. Scaling up cooperative multi-agent reinforcement learning. (2025). 1-198.
Available at: https://ink.library.smu.edu.sg/etd_coll/818

Copyright Owner and License

Author

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Dissertations and Theses Collection (Open Access)

Scaling up cooperative multi-agent reinforcement learning

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Search

Links

Browse

Links

Dissertations and Theses Collection (Open Access)

Scaling up cooperative multi-agent reinforcement learning

Author

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Share

Search

Links

Browse

Links