Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2015

Abstract

Using feedback signals from the environment, a reinforcement learning (RL) system typically discovers action policies that recommend actions effective to the states based on a Q-value function. However, uncertainties over the estimation of the Q-values can delay the convergence of RL. For fast RL convergence by accounting for such uncertainties, this paper proposes several enhancements to the estimation and learning of the Q-value using a self-organizing neural network. Specifically, a temporal difference method known as Q-learning is complemented by a Q-value Polarization procedure, which contrasts the Q-values using feedback signals on the effect of the recommended actions. The polarized Q-values are then learned by the self-organizing neural network using a Bi-directional Template Learning procedure. Furthermore, the polarized Q-values are in turn used to adapt the reward vigilance of the ART-based self-organizing neural network using a Bi-directional Adaptation procedure. The efficacy of the resultant system called Fast Learning (FL) FALCON is illustrated using two single-task problem domains with large MDPs. The experiment results from these problem domains unanimously show FL-FALCON converging faster than the compared approaches.

Discipline

Databases and Information Systems | OS and Networks

Research Areas

Data Science and Engineering

Publication

Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2015)

Volume

2

First Page

51

Last Page

58

Identifier

10.1109/WI-IAT.2015.103

Publisher

IEEE

City or Country

New York

Share

COinS