Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

2-2024

Abstract

To train generalizable Reinforcement Learning (RL) agents, researchers recently proposed the Unsupervised Environment Design (UED) framework, in which a teacher agent creates a very large number of training environments and a student agent trains on the experiences in these environments to be robust against unseen testing scenarios. For example, to train a student to master the “stepping over stumps” task, the teacher will create numerous training environments with varying stump heights and shapes. In this paper, we argue that UED neglects training efficiency and its need for very large number of environments (henceforth referred to as infinite horizon training) makes it less suitable to training robots and non-expert humans. In real-world applications where either creating new training scenarios is expensive or training efficiency is of critical importance, we want to maximize both the learning efficiency and learning outcome of the student. To achieve efficient finite horizon training, we propose a novel Markov Decision Process (MDP) formulation for the teacher agent, referred to as Unsupervised Training Sequence Design (UTSD). Specifically, we encode salient information from the student policy (e.g., behaviors and learning progress) into the teacher's state space, enabling the teacher to closely track the student's learning progress and consequently discover the optimal training sequences with finite lengths. Additionally, we explore the teacher's efficient adaptation to unseen students at test time by employing the context-based meta-learning approach, which leverages the teacher's past experiences with various students. Finally, we empirically demonstrate our teacher's capability to design efficient and effective training sequences for students with varying capabilities.

Discipline

Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the AAAI Conference on Artificial Intelligence 2024: Vancouver, February 20-27: Proceedings

First Page

13637

Last Page

13645

Identifier

10.1609/aaai.v38i12.29268

Publisher

AAAI Press

City or Country

Briarcliff Manor, NY

Citation

LI, Wenjun and VARAKANTHAM, Pradeep. Unsupervised training sequence design: Efficient and generalizable agent training. (2024). Proceedings of the AAAI Conference on Artificial Intelligence 2024: Vancouver, February 20-27: Proceedings. 13637-13645.
Available at: https://ink.library.smu.edu.sg/sis_research/9362

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1609/aaai.v38i12.29268

Download

Included in

Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Unsupervised training sequence design: Efficient and generalizable agent training

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Unsupervised training sequence design: Efficient and generalizable agent training

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links