Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

3-2023

Abstract

In the richly multimedia Web, detecting sentiment signals expressed in images would support multiple applications, e.g., measuring customer satisfaction from online reviews, analyzing trends and opinions from social media. Given an image, visual sentiment analysis aims at recognizing positive or negative sentiment, and occasionally neutral sentiment as well. A nascent yet promising direction is Transformer-based models applied to image data, whereby Vision Transformer (ViT) establishes remarkable performance on largescale vision benchmarks. In addition to investigating the fitness of ViT for visual sentiment analysis, we further incorporate concept orientation into the self-attention mechanism, which is the core component of Transformer. The proposed model captures the relationships between image features and specific concepts. We conduct extensive experiments on Visual Sentiment Ontology (VSO) and Yelp.com online review datasets, showing that not only does the proposed model significantly improve upon the base model ViT in detecting visual sentiment but it also outperforms previous visual sentiment analysis models with narrowly-defined orientations. Additional analyses yield insightful results and better understanding of the concept-oriented self-attention mechanism.

Keywords

visual sentiment analysis, concept orientation, transformers

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

WSDM '23: Proceedings of the 16th ACM International Conference on Web Search and Data Mining, Singapore, February 27-March 3

First Page

1111

Last Page

1119

ISBN

9781450394079

Identifier

10.1145/3539597.3570437

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3539597.3570437

Share

COinS