Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

2-2019

Abstract

Detecting the sentiment expressed by a document is a key task for many applications, e.g., modeling user preferences, monitoring consumer behaviors, assessing product quality. Traditionally, the sentiment analysis task primarily relies on textual content. Fueled by the rise of mobile phones that are often the only cameras on hand, documents on the Web (e.g., reviews, blog posts, tweets) are increasingly multimodal in nature, with photos in addition to textual content. A question arises whether the visual component could be useful for sentiment analysis as well. In this work, we propose Visual Aspect Attention Network or VistaNet, leveraging both textual and visual components. We observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently of the text. Therefore, instead of using visual information as features, VistaNet relies on visual information as alignment for pointing out the important sentences of a document using attention. Experiments on restaurant reviews showcase the effectiveness of visual aspect attention, vis-a-vis visual features or textual attention.

Keywords

sentiment analysis, multimodal, attention network

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

Proceedings of the 33rd AAAI Conference on Artificial Intelligence 2019: Honolulu, January 27 - February 1

First Page

305

Last Page

312

Identifier

10.1609/aaai.v33i01.3301305

Publisher

AAAI Press

City or Country

Menlo Park, CA

Additional URL

https://doi.org/10.1609/aaai.v33i01.3301305

Share

COinS