Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

11-2017

Abstract

In this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a simple greedy policy, named greedy CMAB (G-CMAB), achieves bounded regret. This improves the result in previous work, which shows that the regret is O (log T) under no such assumption on the ATPs. Then, we numerically show that G-CMAB achieves bounded regret in a real-world movie recommendation problem, where the action corresponds to recommending a set of movies, arms correspond to the edges between movies and users, and the goal is to maximize the total number of users that are attracted by at least one movie. In addition to this problem, our results directly apply to the online influence maximization (OIM) problem studied in numerous prior works.

Keywords

Combinatorial multi-armed bandit, probabilistically triggered arms, bounded regret, online learning

Discipline

Operations and Supply Chain Management

Research Areas

Operations Management

Publication

2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP): Montreal, November 14-16: Proceedings

First Page

111

Last Page

115

ISBN

9781509059904

Identifier

10.1109/GlobalSIP.2017.8308614

Publisher

IEEE

City or Country

Piscataway, NJ

Additional URL

https://doi.org/10.1109/GlobalSIP.2017.8308614

Share

COinS