Research Collection School Of Computing and Information Systems

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

Bobo LI
Hao FEI
Lizi LIAO, Singapore Management UniversityFollow
Yu ZHAO
Chong TENG
Tat-Seng CHUA
Donghong Ji
Fei LI

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2023

Abstract

It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.

Keywords

emotion recognition, multimodal learning

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

MM '23: Proceedings of the 31st ACM International Conference on Multimedia: Ottawa, October 29 - November 3

First Page

5923

Last Page

5934

ISBN

9798400701085

Identifier

10.1145/3581783.3612053

Publisher

ACM

City or Country

New York

Citation

LI, Bobo; FEI, Hao; LIAO, Lizi; ZHAO, Yu; TENG, Chong; CHUA, Tat-Seng; Ji, Donghong; and LI, Fei. Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition. (2023). MM '23: Proceedings of the 31st ACM International Conference on Multimedia: Ottawa, October 29 - November 3. 5923-5934.
Available at: https://ink.library.smu.edu.sg/sis_research/8485

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3581783.3612053

Download

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links