Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
12-2012
Abstract
Exploration is necessary during reinforcement learning to discover new solutions in a given problem space. Most reinforcement learning systems, however, adopt a simple strategy, by randomly selecting an action among all the available actions. This paper proposes a novel exploration strategy, known as Knowledge-based Exploration, for guiding the exploration of a family of self-organizing neural networks in reinforcement learning. Specifically, exploration is directed towards unexplored and favorable action choices while steering away from those negative action choices that are likely to fail. This is achieved by using the learned knowledge of the agent to identify prior action choices leading to low Q-values in similar situations. Consequently, the agent is expected to learn the right solutions in a shorter time, improving overall learning efficiency. Using a Pursuit-Evasion problem domain, we evaluate the efficacy of the knowledge-based exploration strategy, in terms of task performance, rate of learning and model complexity. Comparison with random exploration and three other heuristic-based directed exploration strategies show that Knowledge-based Exploration is significantly more effective and robust for reinforcement learning in real time.
Keywords
Reinforcement Learning, Self-Organizing Neural Network, Directed Exploration, Rule-Based System
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
2012 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT): December 4-7, Macau
Volume
2
First Page
332
Last Page
339
ISBN
9781467360579
Identifier
10.1109/WI-IAT.2012.154
Publisher
IEEE Computer Society
City or Country
Los Alamitos, CA
Citation
1
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/WI-IAT.2012.154