Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2021

Abstract

We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP). The RL policy is trained as the destroy operator to select a subset of variables at each step, which is reoptimized by an IP solver as the repair operator. However, the combinatorial number of variable subsets prevents direct application of typical RL algorithms. To tackle this challenge, we represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm. We evaluate the proposed method on four representative IP problems. Results show that it can find better solutions than SCIP in much less time, and significantly outperform other LNS baselines with the same runtime. Moreover, these advantages notably persist when the policies generalize to larger problems. Further experiments with Gurobi also reveal that our method can outperform this state-of-the-art commercial solver within the same time limit.

Keywords

Actor-critic algorithm, Binary decision, Integer programming problems, Neural-networks, Reinforcement learning algorithms

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 35th Conference on Neural Information Processing Systems, Virtual conference, 2021 Dec 6-14

Volume

36

First Page

30075

Last Page

30087

ISBN

9781713845393

Identifier

10.48550/arXiv.2111.03466

Publisher

Neural information processing systems foundation

City or Country

California

Copyright Owner and License

Authors

Additional URL

http://doi.org/10.48550/arXiv.2111.03466

Share

COinS