Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

3-2025

Abstract

Training generally capable agents in complex environments is a challenging task that involves identifying the “right” environments at the training stage. Recent research has highlighted the potential of the Unsupervised Environment Design framework, which generates environment instances/levels adaptively at the frontier of the agent’s capabilities using regret measures. While regret approaches have shown promise in generating feasible environments, they can produce difficult environments that are challenging for an RL agent to learn from. This is because regret represents the best-case (upper bound) learning potential and not the actual learning potential of an environment. To address this, we propose an alternative mechanism that employs marginal benefit, focusing on the improvement (in terms of generalized performance) the agent policy gets for a given environment. The advantage of this new mechanism is that it is agent-focused (and not environment focused) and generates the “right” environments depending on the agent’s policy. Additionally, to improve the generalizability of the agent, we introduce a representative state diversity metric that aims to generate varied experiences for the agent. Finally, we provide detailed experimental results and ablation analysis to showcase the effectiveness of our methods. We obtain SOTA results among RL-based environment generation methods.

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), Philadelphia, Pennyslvania, February 25 - March 4

Volume

39

First Page

18253

Last Page

18261

Identifier

10.1609/aaai.v39i17.34008

Publisher

AAAI

City or Country

Philadelphia

Additional URL

https://doi.org/10.1609/aaai.v39i17.34008

Share

COinS