Publication Type
Journal Article
Version
submittedVersion
Publication Date
6-2019
Abstract
In this paper we propose a smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes. In contrast to the Q-learning algorithm in which non-regular inference is involved, we show that under assumptions adopted in this paper, the proposed smoothed Q-learning estimator is asymptotically normally distributed even when the Q-learning estimator is not and its asymptotic variance can be consistently estimated. As a result, inference based on the smoothed Q-learning estimator is standard. We derive the optimal smoothing parameter and propose a data-driven method for estimating it. The finite sample properties of the smoothed Q-learning estimator are studied and compared with several existing estimators including the Q-learning estimator via an extensive simulation study. We illustrate the new method by analyzing data from the Clinical Antipsychotic Trials of Intervention EffectivenessAlzheimer’s Disease (CATIE-AD) study.
Keywords
Asymptotic normality, Exceptional law, Optimal smoothing parameter, Sequential randomization, Wald-type inference
Discipline
Econometrics
Research Areas
Econometrics
Publication
Scandinavian Journal of Statistics
Volume
46
Issue
2
First Page
446
Last Page
469
ISSN
0303-6898
Identifier
10.1111/sjos.12359
Publisher
Wiley
Citation
FAN, Yanqin; HE, Ming; SU, Liangjun; and ZHOU, Xiao-Hua.
A smoothed Q-learning algorithm for estimating optimal dynamic treatment regime. (2019). Scandinavian Journal of Statistics. 46, (2), 446-469.
Available at: https://ink.library.smu.edu.sg/soe_research/2044
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1111/sjos.12359