Publication Type

Journal Article

Version

publishedVersion

Publication Date

11-2015

Abstract

Social media has been a convenient platform for voicing opinions through posting messages, ranging from tweeting a short text to uploading a media file, or any combination of messages. Understanding the perceived emotions inherently underlying these user-generated contents (UGC) could bring light to emerging applications such as advertising and media analytics. Existing research efforts on affective computation are mostly dedicated to single media, either text captions or visual content. Few attempts for combined analysis of multiple media are made, despite that emotion can be viewed as an expression of multimodal experience. In this paper, we explore the learning of highly non-linear relationships that exist among low-level features across different modalities for emotion prediction. Using the deep Bolzmann machine (DBM), a joint density model over the space of multimodal inputs, including visual, auditory, and textual modalities, is developed. The model is trained directly using UGC data without any labeling efforts. While the model learns a joint representation over multimodal inputs, training samples in absence of certain modalities can also be leveraged. More importantly, the joint representation enables emotion-oriented cross-modal retrieval, for example, retrieval of videos using the text query "crazy cat". The model does not restrict the types of input and output, and hence, in principle, emotion prediction and retrieval on any combinations of media are feasible. Extensive experiments on web videos and images show that the learnt joint representation could be very compact and be complementary to hand-crafted features, leading to performance improvement in both emotion classification and cross-modal retrieval.

Keywords

Cross-modal retrieval, deep Boltzmann machine, emotion analysis, multimodal learning

Discipline

Data Storage Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

IEEE Transactions on Multimedia

Volume

Issue

First Page

2008

Last Page

2020

ISSN

1520-9210

Identifier

10.1109/TMM.2015.2482228

Publisher

Institute of Electrical and Electronics Engineers

Citation

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Data Storage Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Deep multimodal learning for affective analysis and retrieval

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Deep multimodal learning for affective analysis and retrieval

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links