Publication Type

Journal Article

Version

submittedVersion

Publication Date

6-2019

Abstract

In this paper we propose a smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes. In contrast to the Q-learning algorithm in which non-regular inference is involved, we show that under assumptions adopted in this paper, the proposed smoothed Q-learning estimator is asymptotically normally distributed even when the Q-learning estimator is not and its asymptotic variance can be consistently estimated. As a result, inference based on the smoothed Q-learning estimator is standard. We derive the optimal smoothing parameter and propose a data-driven method for estimating it. The finite sample properties of the smoothed Q-learning estimator are studied and compared with several existing estimators including the Q-learning estimator via an extensive simulation study. We illustrate the new method by analyzing data from the Clinical Antipsychotic Trials of Intervention EffectivenessAlzheimer’s Disease (CATIE-AD) study.

Keywords

Asymptotic normality, Exceptional law, Optimal smoothing parameter, Sequential randomization, Wald-type inference

Discipline

Econometrics

Research Areas

Econometrics

Publication

Scandinavian Journal of Statistics

Volume

46

Issue

2

First Page

446

Last Page

469

ISSN

0303-6898

Identifier

10.1111/sjos.12359

Publisher

Wiley

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1111/sjos.12359

Included in

Econometrics Commons

Share

COinS