Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
3-2022
Abstract
Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-toend model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorizationmaximization perspective to iteratively optimize the DRP w.r.t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity. Empirical evaluation confirms that ILBO is significantly more sampleefficient than the state-of-the-art DRP planner and consistently produces better solution quality with lower variance. We additionally demonstrate that ILBO generalizes well to new problem instances (i.e., different initial states) without requiring retraining.
Discipline
Artificial Intelligence and Robotics
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual, Vancouver, Canada, 2022 February 22 - March 1.
First Page
9840
Last Page
9848
ISBN
9781577358763
Publisher
AAAI Press
City or Country
Palo Alto, California USA
Citation
LOW, Siow Meng; KUMAR, Akshat; and SANNER, Scott.
Sample-efficient iterative lower bound optimization of deep reactive policies for planning in continuous MDPs. (2022). Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual, Vancouver, Canada, 2022 February 22 - March 1.. 9840-9848.
Available at: https://ink.library.smu.edu.sg/sis_research/7724
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://www.aaai.org/Library/AAAI/aaai22contents.php#:~:text=Published%20by%20the-,AAAI%20Press,-%2C%20Palo%20Alto%2C%20California