Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2025

Abstract

Reinforcement learning via supervised learning (RvS) has been known as a burgeoning paradigm for offline reinforcement learning (RL). While return-conditioned RvS (RvS-R) predominates across a wide range of datasets pertaining to the offline RL tasks, recent findings suggest that goal-conditioned RvS (RvS-G) outperforms in specific sub-optimal datasets where trajectory stitching is crucial for achieving optimal performance. However, the underlying reasons for this superiority remain insufficiently explored. In this paper, employing didactic experiments and theoretical analysis, we reveal that the proficiency of RvS-G in stitching trajectories arises from its adeptness in generalizing to unknown goals during evaluation. Building on this insight, we introduce a novel RvS-G approach, Spatial Composition RvS (SC-RvS), to enhance its ability to generalize to unknown goals. This, in turn, augments the trajectory stitching performance on sub-optimal datasets. Specifically, by harnessing the power of advantage weight and maximum-entropy regularized weight, our approach adeptly balances the promotion of optimistic goal sampling with the preservation of a nuanced level of pessimism in action selection compared to existing RvS-G methods. Extensive experimental results on D4RL benchmarks show that our SC-RvS performed favorably against the baselines in most cases, especially on the sub-optimal datasets that demand trajectory stitching.

Keywords

Goal Conditioned Reinforcement Learning via Supervised Learning, Offline Reinforcement Learning, Sub-optimal Trajectory Stitch

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Sustainability

Publication

AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, USA, May 19 - 23

First Page

2290

Last Page

2298

ISBN

9798400714269

Identifier

10.5555/3709347.3743869

Publisher

ACM

City or Country

New York

Share

COinS