Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2019

Abstract

Key to recommender systems is learning user preferences, which are expressed through various modalities. In online reviews, for instance, this manifests in numerical rating, textual content, as well as visual images. In this work, we hypothesize that modelling these modalities jointly would result in a more holistic representation of a review towards more accurate recommendations. Therefore, we propose Multimodal Review Generation (MRG), a neural approach that simultaneously models a rating prediction component and a review text generation component. We hypothesize that the shared user and item representations would augment the rating prediction with richer information from review text, while sensitizing the generated review text to sentiment features based on user and item of interest. Moreover, when review photos are available, visual features could inform the review text generation further. Comprehensive experiments on real-life datasets from several major US cities show that the proposed model outperforms comparable multimodal baselines, while an ablation analysis establishes the relative contributions of the respective components of the joint model.

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

WW '19: Proceedings of the World Wide Web Conference, San Francisco, May 13-17

First Page

1864

Last Page

1874

ISBN

9781450366748

Identifier

10.1145/3308558.3313463

Publisher

ACM

City or Country

New York

Copyright Owner and License

Publisher

Additional URL

https://doi.org/10.1145/3308558.3313463

Share

COinS