Publication Type

Journal Article

Version

acceptedVersion

Publication Date

7-2021

Abstract

Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two scenarios, i.e., with and without counterfactual re-weighting. For speedier item ranking, we further investigate the possibility of using Maximum-a-Posteriori (MAP) estimate instead of Monte Carlo (MC)-based approximation for prediction. Experiments on both real datasets as well as data from a simulation environment show substantial performance improvements over comparable baselines.

Keywords

Variational learning, Bandit feedback, Recommender systems, Computational advertising

Discipline

Databases and Information Systems | Data Science

Research Areas

Data Science and Engineering

Publication

Machine Learning

Volume

110

Issue

8

First Page

2085

Last Page

2105

ISSN

0885-6125

Identifier

10.1007/s10994-021-06028-0

Publisher

Springer

Embargo Period

12-13-2021

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/s10994-021-06028-0

Share

COinS