Dissertations and Theses Collection (Open Access)

From sparse feedback to sequential decision-making: Learning safety constraints with weak supervision

Siow Meng LOW, Singapore Management UniversityFollow

Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

7-2025

Abstract

Real-world decision-making often involves safety constraints that are implicit, non-Markovian, or difficult to specify directly. Standard reinforcement learning (RL) approaches typically assume access to fully specified cost functions and constraint budgets—assumptions that limit their applicability in domains where such structure must instead be inferred from data. This dissertation develops a sequence of methods for learning safety-relevant structure from weak supervision, such as sparse binary feedback on trajectory segments, and using these signals to guide planning and policy optimization.

The first part of the dissertation introduces a sample-efficient method for planning in continuous Markov Decision Processes (MDPs) using deep reactive policies. This method, Iterative Lower-Bound Optimization (ILBO), provides a stable and sample-efficient framework for optimizing deterministic policies through iterative local improvement, and serves as a foundation for later developments in cost-aware learning. While ILBO assumes access to known reward and transition functions, the remainder of the dissertation relaxes this assumption by learning safety structure directly from data.

Subsequent chapters explore how to recover latent safety constraints from limited feedback. One contribution develops a method to infer non-Markovian constraints from sparse trajectory-level labels in known-dynamics settings. This is extended to unknown environments with the TraCeS algorithm, which actively selects informative trajectories for annotation and learns safety credit to guide safe reinforcement learning. Finally, the RLUC framework addresses the problem of learning unknown non-Markovian constraints in off-policy RL, integrating constraint inference with probabilistic modeling and policy optimization.

Together, these contributions offer a unified framework for constraint learning and safe decisionmaking from weak supervision, bridging planning and RL under uncertain safety structure.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Computer and Systems Architecture

Supervisor(s)

KUMAR, Akshat

First Page

Last Page

165

Publisher

Singapore Management University

City or Country

Singapore

Citation

LOW, Siow Meng. From sparse feedback to sequential decision-making: Learning safety constraints with weak supervision. (2025). 1-165.
Available at: https://ink.library.smu.edu.sg/etd_coll/796

Copyright Owner and License

Author

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Available for download on Thursday, August 27, 2026

Included in

Artificial Intelligence and Robotics Commons, Computer and Systems Architecture Commons

COinS

Dissertations and Theses Collection (Open Access)

From sparse feedback to sequential decision-making: Learning safety constraints with weak supervision

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Search

Links

Browse

Links

Dissertations and Theses Collection (Open Access)

From sparse feedback to sequential decision-making: Learning safety constraints with weak supervision

Author

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Share

Search

Links

Browse

Links