Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2025

Abstract

Recent Deep Reinforcement Learning (DRL) techniques have advanced solutions to Vehicle Routing Problems (VRPs). However, many of these methods focus exclusively on optimizing distance-oriented objectives (i.e., minimizing route length), often overlooking the implicit drivers' preferences for routes. These preferences, which are crucial in practice, are challenging to model using traditional DRL approaches. To address this gap, we propose a preference-based DRL method characterized by its reward design and optimization objective, which is specialized to learn historical route preferences. Our experiments demonstrate that the method aligns generated solutions more closely with human preferences. Moreover, it exhibits strong generalization performance across a variety of instances, offering a robust solution for different VRP scenarios.

Keywords

preference learning, deep reinforcement learning, historical route estimation, vehicle routing problem, driver behavior modeling, reward design, human-centered optimization, route preference alignment, neural combinatorial optimization, generalization

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2025)

Identifier

10.24963/ijcai.2025/955

City or Country

Montreal, Canada

Additional URL

https://doi.org/10.24963/ijcai.2025/955

Share

COinS