Publication Type
Journal Article
Version
acceptedVersion
Publication Date
7-2021
Abstract
Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two scenarios, i.e., with and without counterfactual re-weighting. For speedier item ranking, we further investigate the possibility of using Maximum-a-Posteriori (MAP) estimate instead of Monte Carlo (MC)-based approximation for prediction. Experiments on both real datasets as well as data from a simulation environment show substantial performance improvements over comparable baselines.
Keywords
Variational learning, Bandit feedback, Recommender systems, Computational advertising
Discipline
Databases and Information Systems | Data Science
Research Areas
Data Science and Engineering
Publication
Machine Learning
Volume
110
Issue
8
First Page
2085
Last Page
2105
ISSN
0885-6125
Identifier
10.1007/s10994-021-06028-0
Publisher
Springer
Embargo Period
12-13-2021
Citation
TRUONG, Quoc Tuan and LAUW, Hady W..
Variational learning from implicit bandit feedback. (2021). Machine Learning. 110, (8), 2085-2105.
Available at: https://ink.library.smu.edu.sg/sis_research/6431
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s10994-021-06028-0