Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2025

Abstract

Goal-oriented dialogues, such as recommendation and negotiation, often require balancing multiple, conflicting objectives. Existing methods typically involve training separate models for specific combinations of objectives, leading to computational and scalability issues. In this work, we aim to develop a new dialogue policy method that can adapt to varying objective preferences at inference time without retraining. This raises several challenges in terms of both (1) optimization strategy and (2) knowledge utilization. To address these, we propose a novel learning framework, Preference Adaptive Dialogue Policy Planner (PADPP), for multi-objective goal-oriented dialogues. Specifically, to tackle the former, we introduce a novel policy optimization scheme, which leverages information gained from training the model on previously updated objective weights, accelerating the learning capability on new weight settings. To address the latter, we utilize Generalized Policy Improvement (GPI) to ensure the effectiveness of leveraged knowledge. Experimental results demonstrate that PADPP achieves superior adaptability and performance compared to state-of-the-art approaches, offering a scalable and flexible solution for multi-objective, goal-oriented dialogues. Code and data are available at the anonymous link.

Discipline

Artificial Intelligence and Robotics | Programming Languages and Compilers

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China (EMNLP 2025), November 4-9

First Page

22092

Last Page

22116

Identifier

10.18653/v1/2025.emnlp-main.1123

Publisher

ACL

City or Country

China

Additional URL

https://doi.org/10.18653/v1/2025.emnlp-main.1123

Share

COinS