Research Collection School Of Computing and Information Systems

FlowPG: Action-constrained policy gradient with normalizing flows

BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, Singapore Management UniversityFollow
Jiajing LING, Singapore Management UniversityFollow
Akshat KUMAR, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2023

Abstract

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, December 10-16

First Page

Last Page

Publisher

NeurIPS

City or Country

New Orleans

Citation

BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA; LING, Jiajing; and KUMAR, Akshat. FlowPG: Action-constrained policy gradient with normalizing flows. (2023). Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, December 10-16. 1-15.
Available at: https://ink.library.smu.edu.sg/sis_research/8551

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

FlowPG: Action-constrained policy gradient with normalizing flows

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

FlowPG: Action-constrained policy gradient with normalizing flows

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links