Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
11-2017
Abstract
In this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a simple greedy policy, named greedy CMAB (G-CMAB), achieves bounded regret. This improves the result in previous work, which shows that the regret is O (log T) under no such assumption on the ATPs. Then, we numerically show that G-CMAB achieves bounded regret in a real-world movie recommendation problem, where the action corresponds to recommending a set of movies, arms correspond to the edges between movies and users, and the goal is to maximize the total number of users that are attracted by at least one movie. In addition to this problem, our results directly apply to the online influence maximization (OIM) problem studied in numerous prior works.
Keywords
Combinatorial multi-armed bandit, probabilistically triggered arms, bounded regret, online learning
Discipline
Operations and Supply Chain Management
Research Areas
Operations Management
Publication
2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP): Montreal, November 14-16: Proceedings
First Page
111
Last Page
115
ISBN
9781509059904
Identifier
10.1109/GlobalSIP.2017.8308614
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
SARITAC, Omer and TEKIN, Cem.
Combinatorial multi-armed bandit problem with probabilistically triggered arms: A case with bounded regret. (2017). 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP): Montreal, November 14-16: Proceedings. 111-115.
Available at: https://ink.library.smu.edu.sg/lkcsb_research/7605
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/GlobalSIP.2017.8308614