Publication Type
Journal Article
Version
acceptedVersion
Publication Date
11-2020
Abstract
The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics, has been largely left untouched. In this article, we investigate the underestimation phenomenon in the recent twin delay deep deterministic actor-critic algorithm and theoretically demonstrate its existence. We also observe that this underestimation bias does indeed hurt performance in various experiments. Considering the opposite properties of single-critic and double-critic methods, we propose a novel triplet-average deep deterministic policy gradient algorithm that takes the weighted action value of three target critics to reduce the estimation bias. Given the connection between estimation bias and approximation error, we suggest averaging previous target values to reduce per-update error and further improve performance. Extensive empirical results over various continuous control tasks in OpenAI gym show that our approach outperforms the state-of-the-art methods. Source code available at https://github.com/shenjianbing/TADDRL.
Keywords
Averaging technology, deep reinforcement learning (DRL), estimation bias, triplet networks
Discipline
Numerical Analysis and Scientific Computing | Software Engineering | Theory and Algorithms
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Neural Networks and Learning Systems
Volume
31
Issue
11
First Page
4933
Last Page
4945
ISSN
2162-237X
Identifier
10.1109/TNNLS.2019.2959129
Publisher
IEEE
Embargo Period
5-10-2021
Citation
WU, Dongming; DONG, Xingping; SHEN, Jianbing; and HOI, Steven C. H..
Reducing estimation bias via triplet-average deep deterministic policy gradient. (2020). IEEE Transactions on Neural Networks and Learning Systems. 31, (11), 4933-4945.
Available at: https://ink.library.smu.edu.sg/sis_research/5920
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TNNLS.2019.2959129
Included in
Numerical Analysis and Scientific Computing Commons, Software Engineering Commons, Theory and Algorithms Commons