Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2015
Abstract
Detecting emotions from user-generated videos, such as“anger” and “sadness”, has attracted widespread interest recently. The problem is challenging as effectively representing video data with multi-view information (e.g., audio, video or text) is not trivial. In contrast to the existing works that extract features from each modality (view) separately followed by early or late fusion, we propose to learn a joint density model over the space of multi-modal inputs (including visual, auditory and textual modalities) with Deep Boltzmann Machine (DBM). The model is trained directly on the user-generated Web videos without any labeling effort. More importantly, the deep architecture enlightens the possibility of discovering the highly non-linear relationships that exist between lowlevel features across different modalities. The experiment results show that the DBM model learns joint representation complementary to the hand-crafted visual and auditory features, leading to 7.7% performance improvement in classification accuracy on the recently released VideoEmotion dataset.
Keywords
Deep Boltzmann Machine, Emotion analysis, Multimodal learning
Discipline
Databases and Information Systems | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 5th ACM International Conference on Multimedia Retrieval, ICMR 2015, Shanghai, China, June 23-26
First Page
619
Last Page
622
ISBN
9781450332743
Identifier
10.1145/2671188.2749400
Publisher
ACM
City or Country
Shanghai
Citation
PANG, Lei and NGO, Chong-wah.
Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos. (2015). Proceedings of the 5th ACM International Conference on Multimedia Retrieval, ICMR 2015, Shanghai, China, June 23-26. 619-622.
Available at: https://ink.library.smu.edu.sg/sis_research/6502
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons