Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2021

Abstract

Although conversational search has become a hot topic in both dialogue research and IR community, the real breakthrough has been limited by the scale and quality of datasets available. To address this fundamental obstacle, we introduce the Multimodal Multi-domain Conversational dataset (MMConv), a fully annotated collection of human-to-human role-playing dialogues spanning over multiple domains and tasks. The contribution is two-fold. First, beyond the task-oriented multimodal dialogues among user and agent pairs, dialogues are fully annotated with dialogue belief states and dialogue acts. More importantly, we create a relatively comprehensive environment for conducting multimodal conversational search with real user settings, structured venue database, annotated image repository as well as crowd-sourced knowledge database. A detailed description of the data collection procedure along with a summary of data structure and analysis is provided. Second, a set of benchmark results for dialogue state tracking, conversational recommendation, response generation as well as a unified model for multiple tasks are reported. We adopt the state-of-the-art methods for these tasks respectively to demonstrate the usability of the data, discuss limitations of current methods and set baselines for future studies.

Keywords

datasets, multimodal dialogue, conversational search

Discipline

Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing | Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Publication

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 11-15, Virtual

First Page

675

Last Page

684

ISBN

9781450380379

Identifier

10.1145/3404835.3462970

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3404835.3462970

Share

COinS