Research Collection School Of Computing and Information Systems

Imitating cost-constrained behaviors in reinforcement learning

Qian SHAO, Singapore Management UniversityFollow
Pradeep VARAKANTHAM, Singapore Management UniversityFollow
Shih-Fen CHENG, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.

Discipline

Artificial Intelligence and Robotics | Operations Research, Systems Engineering and Industrial Engineering

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 34th International Conference on Automated Planning and Scheduling, ICAPS 2024: Banff, June 1-6

Volume

First Page

514

Last Page

522

ISBN

9781577358893

Identifier

10.1609/icaps.v34i1.31512

Publisher

Association for the Advancement of Artificial Intelligence

City or Country

Banaff

Citation

SHAO, Qian; VARAKANTHAM, Pradeep; and CHENG, Shih-Fen. Imitating cost-constrained behaviors in reinforcement learning. (2024). Proceedings of the 34th International Conference on Automated Planning and Scheduling, ICAPS 2024: Banff, June 1-6. 34, 514-522.
Available at: https://ink.library.smu.edu.sg/sis_research/9496

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1609/icaps.v34i1.31512

Download

Included in

Artificial Intelligence and Robotics Commons, Operations Research, Systems Engineering and Industrial Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Imitating cost-constrained behaviors in reinforcement learning

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Imitating cost-constrained behaviors in reinforcement learning

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links