Publication Type

Conference Proceeding Article

Publication Date

5-2014

Abstract

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs.

Discipline

Artificial Intelligence and Robotics | Operations Research, Systems Engineering and Industrial Engineering

Research Areas

Intelligent Systems and Decision Analytics

Publication

AAMAS '14: Proceedings of the 2014 International Conference on Autonomous Agents and Multiagent Systems: May 5-9, 2014, Paris, France

First Page

1341

Last Page

1342

ISBN

9781450327381

Publisher

AAMAS

City or Country

Richland, SC

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Additional URL

http://aamas2014.lip6.fr/proceedings/aamas/p1341.pdf

Share

COinS