Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

3-2025

Abstract

In many RL applications, ensuring an agent’s actions adhere to constraints is crucial for safety. Most previous methods in Action-Constrained Reinforcement Learning (ACRL) employ a projection layer after the policy network to correct the action. However projection-based methods suffer from issues like the zero gradient problem and higher runtime due to the usage of optimization solvers. Recently methods were proposed to train generative models to learn a differentiable mapping between latent variables and feasible actions to address this issue. However, generative models require training using samples from the constrained action space, which itself is challenging. To address such limitations, first, we define a target distribution for feasible actions based on constraint violation signals, and train normalizing flows by minimizing the KL divergence between an approximated distribution over feasible actions and the target. This eliminates the need to generate feasible action samples, greatly simplifying the flow model learning. Second, we integrate the learned flow model with existing deep RL methods, which restrict it to exploring only the feasible action space. Third, we extend our approach beyond ACRL to handle state-wise constraints by learning the constraint violation signal from the environment. Empirically, our approach has significantly fewer constraint violations while achieving similar or better quality in several control tasks than previous best methods.

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Sustainability

Publication

Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI‑25), Philadelphia, Pennsylvania, February 25 - March 4

First Page

15614

Last Page

15621

Publisher

AAAI

City or Country

Philadelphia, Pennsylvania

Citation

BRAHMANAGE, Janaka Chathuranga; LING, Jiajing; and KUMAR, Akshat. Leveraging constraint violation signals for action constrained reinforcement learning. (2025). Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI‑25), Philadelphia, Pennsylvania, February 25 - March 4. 15614-15621.
Available at: https://ink.library.smu.edu.sg/sis_research/10666

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://ojs.aaai.org/index.php/AAAI/article/view/33714/35869

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Leveraging constraint violation signals for action constrained reinforcement learning

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Leveraging constraint violation signals for action constrained reinforcement learning

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links