Research Collection School Of Computing and Information Systems

SPRINQL : Sub-optimal demonstrations driven offline imitation learning

Minh Huy HOANG, Singapore Management UniversityFollow
Tien MAI, Singapore Management UniversityFollow
Pradeep VARAKANTHAM, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2024

Abstract

We focus on offline imitation learning (IL), which aims to mimic an expert's behavior using demonstrations without any interaction with the environment. One of the main challenges in offline IL is the limited support of expert demonstrations, which typically cover only a small fraction of the state-action space. While it may not be feasible to obtain numerous expert demonstrations, it is often possible to gather a larger set of sub-optimal demonstrations. For example, in treatment optimization problems, there are varying levels of doctor treatments available for different chronic conditions. These range from treatment specialists and experienced general practitioners to less experienced general practitioners. Similarly, when robots are trained to imitate humans in routine tasks, they might learn from individuals with different levels of expertise and efficiency. In this paper, we propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories. Existing offline IL methods based on behavior cloning or distribution matching often face issues such as overfitting to the limited set of expert demonstrations or inadvertently imitating sub-optimal trajectories from the larger dataset. Our approach, which is based on inverse soft-Q learning, learns from both expert and sub-optimal demonstrations. It assigns higher importance (through learned weights) to aligning with expert demonstrations and lower importance to aligning with sub-optimal ones. A key contribution of our approach, called SPRINQL, is transforming the offline IL problem into a convex optimization over the space of Q functions. Through comprehensive experimental evaluations, we demonstrate that the SPRINQL algorithm achieves state-of-the-art (SOTA) performance on offline IL benchmarks

Keywords

Sup-optimal demonstrations, Imitation learning, Inverse Q learning, Reinforcement learning

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) : Vancouver, Canada, December 10-15

Publisher

NeurIPS

City or Country

Vancouver, Canada

Citation

HOANG, Minh Huy; MAI, Tien; and VARAKANTHAM, Pradeep. SPRINQL : Sub-optimal demonstrations driven offline imitation learning. (2024). Proceedings of 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) : Vancouver, Canada, December 10-15.
Available at: https://ink.library.smu.edu.sg/sis_research/9821

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Comments

PDF provided by faculty

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

SPRINQL : Sub-optimal demonstrations driven offline imitation learning

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Publisher

City or Country

Citation

Creative Commons License

Comments

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

SPRINQL : Sub-optimal demonstrations driven offline imitation learning

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Publisher

City or Country

Citation

Creative Commons License

Comments

Included in

Share

Search

Links

Browse

Links