Research Collection School Of Computing and Information Systems

IOSTOM: Offline imitation learning from observations via state transition occupancy matching

Quang Anh PHAM, Singapore Management UniversityFollow
BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, Singapore Management UniversityFollow
Tien MAI, Singapore Management UniversityFollow
Akshat KUMAR, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2025

Abstract

Offline Learning from Observations (LfO) focuses on enabling agents to imitate expert behavior using datasets that contain only expert state trajectories and separate transition data with suboptimal actions. This setting is both practical and critical in real-world scenarios where direct environment interaction or access to expert action labels is costly, risky, or infeasible. Most existing LfO methods attempt to solve this problem through state or state-action occupancy matching. They typically rely on pretraining a discriminator to differentiate between expert and non-expert states, which could introduce errors and instability—especially when the discriminator is poorly trained. While recent discriminator-free methods have emerged, they generally require substantially more data, limiting their practicality in low-data regimes. In this paper, we propose IOSTOM (), a novel offline LfO algorithm designed to overcome these limitations. Our approach formulates a learning objective based on the joint state visitation distribution. A key distinction of IOSTOM is that it first excludes actions entirely from the training objective. Instead, we learn an that models transition probabilities between states, resulting in a more compact and stable optimization problem. To recover the expert policy, we introduce an efficient action inference mechanism that . Extensive empirical evaluations across diverse offline LfO benchmarks show that IOSTOM substantially outperforms state-of-the-art methods, demonstrating both improved performance and data efficiency.

Keywords

Offline learning from observations, Occupancy matching, Discriminator-free imitation learning, State transition modeling, Action inference

Discipline

Artificial Intelligence and Robotics

Areas of Excellence

Digital transformation

Publication

Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), San Diego, CA, December 2-7

First Page

Last Page

Publisher

Advances in Neural Information Processing Systems

City or Country

United States

Citation

PHAM, Quang Anh; BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA; MAI, Tien; and KUMAR, Akshat. IOSTOM: Offline imitation learning from observations via state transition occupancy matching. (2025). Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), San Diego, CA, December 2-7. 1-36.
Available at: https://ink.library.smu.edu.sg/sis_research/10710

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://openreview.net/pdf?id=OEp1J4V2fN

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

IOSTOM: Offline imitation learning from observations via state transition occupancy matching

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

IOSTOM: Offline imitation learning from observations via state transition occupancy matching

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links