Publication Type

Journal Article

Version

acceptedVersion

Publication Date

10-2025

Abstract

In recent years, applying deep models to automatically learn construction heuristics for vehicle routing problems has achieved remarkable advancements. However, they are less effective in searching solutions due to two primary limitations: relying on deterministic probability distributions and overlooking the strategic advantage of prioritizing nearby unvisited nodes during the route construction process, resulting in suboptimal policies In this paper, we propose a novel lightweight population-based policy optimization (LPPO) framework that learns a diverse population of solution strategies through the utilization of innovative perturbation factors, in order to facilitate search exploration. Moreover, we design a localized attention synthesis (LAS) network to dynamically refine the node selection process by prioritizing effective and informative decision-relevant features. To further ameliorate search efficiency, we leverage a cluster search scheme during inference that rapidly identifies the most effective search strategy from the population. We apply LPPO to address the pickup and delivery traveling salesman problem (PDTSP) and multi-commodity PDTSP (m-PDTSP). Empirical results show that our LPPO achieves lower gaps and better generalization in comparison with the state-of-the-art deep models specialized for PDP variants.

Keywords

Deep reinforcement learning, Localized attention synthesis, Pickup and delivery problems, Population-based search strategy

Discipline

Computer Sciences | Operations Research, Systems Engineering and Industrial Engineering

Research Areas

Intelligent Systems and Optimization

Publication

Computers & Industrial Engineering

Volume

208

First Page

1

Last Page

12

ISSN

0360-8352

Identifier

10.1016/j.cie.2025.111376

Publisher

Elsevier

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1016/j.cie.2025.111376

Share

COinS