Research Collection School Of Computing and Information Systems

Reducing estimation bias via triplet-average deep deterministic policy gradient

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

11-2020

Abstract

The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics, has been largely left untouched. In this article, we investigate the underestimation phenomenon in the recent twin delay deep deterministic actor-critic algorithm and theoretically demonstrate its existence. We also observe that this underestimation bias does indeed hurt performance in various experiments. Considering the opposite properties of single-critic and double-critic methods, we propose a novel triplet-average deep deterministic policy gradient algorithm that takes the weighted action value of three target critics to reduce the estimation bias. Given the connection between estimation bias and approximation error, we suggest averaging previous target values to reduce per-update error and further improve performance. Extensive empirical results over various continuous control tasks in OpenAI gym show that our approach outperforms the state-of-the-art methods. Source code available at https://github.com/shenjianbing/TADDRL.

Keywords

Averaging technology, deep reinforcement learning (DRL), estimation bias, triplet networks

Discipline

Numerical Analysis and Scientific Computing | Software Engineering | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Neural Networks and Learning Systems

Volume

Issue

First Page

4933

Last Page

4945

ISSN

2162-237X

Identifier

10.1109/TNNLS.2019.2959129

Publisher

IEEE

Embargo Period

5-10-2021

Citation

WU, Dongming; DONG, Xingping; SHEN, Jianbing; and HOI, Steven C. H.. Reducing estimation bias via triplet-average deep deterministic policy gradient. (2020). IEEE Transactions on Neural Networks and Learning Systems. 31, (11), 4933-4945.
Available at: https://ink.library.smu.edu.sg/sis_research/5920

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TNNLS.2019.2959129

Download

Find it in your library

Included in

Numerical Analysis and Scientific Computing Commons, Software Engineering Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Reducing estimation bias via triplet-average deep deterministic policy gradient

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Reducing estimation bias via triplet-average deep deterministic policy gradient

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links