Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

1-2026

Abstract

Real-world decision-making systems such as autonomous driving and largescale ride-pooling must operate under strict safety and resource constraints. Traditional Reinforcement Learning (RL) methods, while powerful in simulation, often fail to guarantee such constraints, limiting their real-world deployment. The fundamental challenge lies in integrating constraint satisfaction with long-term reward optimization, especially when outcomes are stochastic and interdependent across multiple agents.

This dissertation advances the field of Constrained Reinforcement Learning (CRL) from both single-agent safety and multi-agent coordination perspectives. In the single-agent setting, we introduce a Reward Penalty framework that augments the state space with cumulative cost and penalizes only trajectories that violate constraints. This formulation unifies different constraint types (expectation, chance, and CVaR) and enables safe variants of standard RL algorithms such as DQN and SAC, achieving faster convergence and stronger safety enforcement than existing primal–dual methods.

In the multi-agent setting, motivated by the on-demand ride-pooling problem, we propose Hierarchical Value Decomposition (HIVES) to capture large-scale agent interactions through hierarchical mixing networks. Building upon HIVES, we further develop FlexiPool to handle flexible pickup and drop-off points, and Pricing RL to jointly optimize matching and pricing for long-term revenue.

These contributions form the foundation for safe and scalable reinforcement learning in complex, constrained environments. The thesis envisions future CRL systems that integrate multi-agent coordination, safety guarantees, and economic reasoning to enable sustainable and intelligent decisionmaking in real-world mobility and beyond.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Supervisor(s)

VARAKANTHAM, Pradeep Reddy

First Page

Last Page

Publisher

Singapore Management University

City or Country

Singapore

Citation

JIANG, Hao. Constrained reinforcement learning: from single-agent safety to multi-agent coordination. (2026). 1-95.
Available at: https://ink.library.smu.edu.sg/etd_coll/832

Copyright Owner and License

Author

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Dissertations and Theses Collection (Open Access)

Constrained reinforcement learning: from single-agent safety to multi-agent coordination

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Search

Links

Browse

Links

Dissertations and Theses Collection (Open Access)

Constrained reinforcement learning: from single-agent safety to multi-agent coordination

Author

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Share

Search

Links

Browse

Links