Research Collection School Of Computing and Information Systems

Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)

Publication Type

Journal Article

Version

publishedVersion

Publication Date

6-2017

Abstract

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

Discipline

Artificial Intelligence and Robotics | Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Publication

Journal of Artificial Intelligence Research

Volume

First Page

229

Last Page

264

ISSN

1076-9757

Identifier

10.1613/jair.5242

Publisher

Association for the Advancement of Artificial Intelligence / AI Access Foundation

Citation

AHMED, Asrar; VARAKANTHAM, Pradeep; LOWALEKAR, Meghna; ADULYASAK, Yossiri; and JAILLET, Patrick. Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs). (2017). Journal of Artificial Intelligence Research. 59, 229-264.
Available at: https://ink.library.smu.edu.sg/sis_research/3937

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1613/jair.5242

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links