Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2014

Abstract

Finding optimal policies for Markov Decision Processes with large state spaces is in general intractable. Nonetheless, simulation-based algorithms inspired by Sparse Sampling (SS) such as Upper Confidence Bound applied in Trees (UCT) and Forward Search Sparse Sampling (FSSS) have been shown to perform reasonably well in both theory and practice, despite the high computational demand. To improve the efficiency of these algorithms, we adopt a simple enhancement technique with a heuristic policy to speed up the selection of optimal actions. The general method, called Aux, augments the look-ahead tree with auxiliary arms that are evaluated by the heuristic policy. In this paper, we provide theoretical justification for the method and demonstrate its effectiveness in two experimental benchmarks that showcase the faster convergence to a near optimal policy for both SS and FSSS. Moreover, to further speed up the convergence of these algorithms at the early stage, we present a novel mechanism to combine them with UCT so that the resulting hybrid algorithm is superior to both of its components.

Keywords

markov decision process, sparse sampling, forward sparse sampling, uct, heuristic

Discipline

Theory and Algorithms

Publication

Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS 2014)

First Page

190

Last Page

198

ISBN

908781135

Publisher

AAAI press

City or Country

Portsmouth, USA

Citation

Nguyen T., Silander T., Lee W., and Tze-Yun LEONG. Bootstrapping simulation-based algorithms with a suboptimal policy. (2014). Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS 2014). 190-198.
Available at: https://ink.library.smu.edu.sg/sis_research/3000

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://www.aaai.org/ocs/index.php/ICAPS/ICAPS14/paper/view/7934/8027

Download

Included in

Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Bootstrapping simulation-based algorithms with a suboptimal policy

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Bootstrapping simulation-based algorithms with a suboptimal policy

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links