Publication Type

Journal Article

Version

publishedVersion

Publication Date

1-2012

Abstract

The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration leading to convergence efficiently. The idea originates from an intuitive understanding that exploration is only necessary when the success rate is low. This means the rate of exploration should be conducted in inverse proportion to the rate of success. In addition, the change in exploration-exploitation rates alters the duration of the learning process. Using this approach, the duration of the learning process becomes adaptive to the updated status of the learning process. Experimental results from the K-Armed Bandit and Air Combat Maneuver scenario prove that optimal action policies can be discovered using the right amount of training iterations. In essence, the proposed method eliminates the guesswork on the amount of exploration needed during reinforcement learning.

Keywords

Reinforcement learning, Exploration-exploitation dilemma, k-armed bandit, Pursuit-evasion, Self-organizing neural network

Discipline

Computer Engineering | Databases and Information Systems | OS and Networks

Research Areas

Data Science and Engineering

Publication

Procedia Computer Science

Volume

First Page

Last Page

ISSN

1877-0509

Identifier

10.1016/j.procs.2012.09.110

Publisher

Elsevier: Creative Commons Attribution Non-Commercial No-Derivatives License

Citation

TENG, Teck-Hou; TAN, Ah-hwee; and TAN, Yuan-Sin. Self‐regulating action exploration in reinforcement learning. (2012). Procedia Computer Science. 13, 18-30.
Available at: https://ink.library.smu.edu.sg/sis_research/5239

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1016/j.procs.2012.09.110

Download

Find it in your library

Included in

Computer Engineering Commons, Databases and Information Systems Commons, OS and Networks Commons

COinS

Research Collection School Of Computing and Information Systems

Self‐regulating action exploration in reinforcement learning

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Self‐regulating action exploration in reinforcement learning

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links