Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2023

Abstract

Offline Reinforcement Learning (RL) has demonstrated promising results in various applications by learning policies from previously collected datasets, reducing the need for online exploration and interactions. However, real-world scenarios usually involve partial observability, which brings crucial challenges of the deployment of offline RL methods: i) the policy trained on data with full observability is not robust against the masked observations during execution, and ii) the information of which parts of observations are masked is usually unknown during training. In order to address these challenges, we present Offline RL with DiscrEte pRoxy representations (ORDER), a probabilistic framework which leverages novel state representations to improve the robustness against diverse masked observabilities. Specifically, we propose a discrete representation of the states and use a proxy representation to recover the states from masked partial observable trajectories. The training of ORDER can be compactly described as the following three steps. i) Learning the discrete state representations on data with full observations, ii) Training the decision module based on the discrete representations, and iii) Training the proxy discrete representations on the data with various partial observations, aligning with the discrete representations. We conduct extensive experiments to evaluate ORDER, showcasing its effectiveness in offline RL for diverse partially observable scenarios and highlighting the significance of discrete proxy representations in generalization performance. ORDER is a flexible framework to employ any offline RL algorithms and we hope that ORDER can pave the way for the deployment of RL policy against various partial observabilities in the real world.

Discipline

Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing | Theory and Algorithms

Areas of Excellence

Digital transformation

Publication

Proceedings of the 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, December 10-16

Volume

First Page

Last Page

Publisher

Neural Information Processing Systems Foundation

City or Country

San Diago

Citation

GU, Pengjie; CAI, Xinyu; XING, Dong; WANG, Xinrun; ZHAO, Mengchen; and AN, Bo. Offline RL with discrete proxy representations for generalizability in POMDPs. (2023). Proceedings of the 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, December 10-16. 36, 1-13.
Available at: https://ink.library.smu.edu.sg/sis_research/9048

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Offline RL with discrete proxy representations for generalizability in POMDPs

Publication Type

Version

Publication Date

Abstract

Discipline

Areas of Excellence

Publication

Volume

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Offline RL with discrete proxy representations for generalizability in POMDPs

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Areas of Excellence

Publication

Volume

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links