Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

7-2025

Abstract

Real-world decision-making often involves safety constraints that are implicit, non-Markovian, or difficult to specify directly. Standard reinforcement learning (RL) approaches typically assume access to fully specified cost functions and constraint budgets—assumptions that limit their applicability in domains where such structure must instead be inferred from data. This dissertation develops a sequence of methods for learning safety-relevant structure from weak supervision, such as sparse binary feedback on trajectory segments, and using these signals to guide planning and policy optimization.

The first part of the dissertation introduces a sample-efficient method for planning in continuous Markov Decision Processes (MDPs) using deep reactive policies. This method, Iterative Lower-Bound Optimization (ILBO), provides a stable and sample-efficient framework for optimizing deterministic policies through iterative local improvement, and serves as a foundation for later developments in cost-aware learning. While ILBO assumes access to known reward and transition functions, the remainder of the dissertation relaxes this assumption by learning safety structure directly from data.

Subsequent chapters explore how to recover latent safety constraints from limited feedback. One contribution develops a method to infer non-Markovian constraints from sparse trajectory-level labels in known-dynamics settings. This is extended to unknown environments with the TraCeS algorithm, which actively selects informative trajectories for annotation and learns safety credit to guide safe reinforcement learning. Finally, the RLUC framework addresses the problem of learning unknown non-Markovian constraints in off-policy RL, integrating constraint inference with probabilistic modeling and policy optimization.

Together, these contributions offer a unified framework for constraint learning and safe decisionmaking from weak supervision, bridging planning and RL under uncertain safety structure.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Computer and Systems Architecture

Supervisor(s)

KUMAR, Akshat

First Page

1

Last Page

165

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Available for download on Thursday, August 27, 2026

Share

COinS