Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2025
Abstract
Reinforcement learning via supervised learning (RvS) has been known as a burgeoning paradigm for offline reinforcement learning (RL). While return-conditioned RvS (RvS-R) predominates across a wide range of datasets pertaining to the offline RL tasks, recent findings suggest that goal-conditioned RvS (RvS-G) outperforms in specific sub-optimal datasets where trajectory stitching is crucial for achieving optimal performance. However, the underlying reasons for this superiority remain insufficiently explored. In this paper, employing didactic experiments and theoretical analysis, we reveal that the proficiency of RvS-G in stitching trajectories arises from its adeptness in generalizing to unknown goals during evaluation. Building on this insight, we introduce a novel RvS-G approach, Spatial Composition RvS (SC-RvS), to enhance its ability to generalize to unknown goals. This, in turn, augments the trajectory stitching performance on sub-optimal datasets. Specifically, by harnessing the power of advantage weight and maximum-entropy regularized weight, our approach adeptly balances the promotion of optimistic goal sampling with the preservation of a nuanced level of pessimism in action selection compared to existing RvS-G methods. Extensive experimental results on D4RL benchmarks show that our SC-RvS performed favorably against the baselines in most cases, especially on the sub-optimal datasets that demand trajectory stitching.
Keywords
Goal Conditioned Reinforcement Learning via Supervised Learning, Offline Reinforcement Learning, Sub-optimal Trajectory Stitch
Discipline
Artificial Intelligence and Robotics
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Sustainability
Publication
AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, USA, May 19 - 23
First Page
2290
Last Page
2298
ISBN
9798400714269
Identifier
10.5555/3709347.3743869
Publisher
ACM
City or Country
New York
Citation
ZANG, Sheng; CAO, Zhiguang; AN, Bo; JAYAVELU, Senthilnath; and LI, Xiaoli.
Enhancing sub-optimal trajectory stitching: Spatial composition RvS for offline RL. (2025). AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, USA, May 19 - 23. 2290-2298.
Available at: https://ink.library.smu.edu.sg/sis_research/10568
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.