Publication Type
Journal Article
Version
acceptedVersion
Publication Date
2-2024
Abstract
Sentiment analysis plays an indispensable part in human-computer interaction. Multimodal sentiment analysis can overcome the shortcomings of unimodal sentiment analysis by fusing multimodal data. However, how to extracte improved feature representations and how to execute effective modality fusion are two crucial problems in multimodal sentiment analysis. Traditional work uses simple sub-models for feature extraction, and they ignore features of different scales and fuse different modalities of data equally, making it easier to incorporate extraneous information and affect analysis accuracy. In this paper, we propose a Multimodal Sentiment Analysis model based on Multi-scale feature extraction and Multi-task learning (M 3 SA). First, we propose a multi-scale feature extraction method that models the outputs of different hidden layers with the method of channel attention. Second, a multimodal fusion strategy based on the key modality is proposed, which utilizes the attention mechanism to raise the proportion of the key modality and mines the relationship between the key modality and other modalities. Finally, we use the multi-task learning approach to train the proposed model, ensuring that the model can learn better feature representations. Experimental results on two publicly available multimodal sentiment analysis datasets demonstrate that the proposed method is effective and that the proposed model outperforms baselines.
Keywords
Multimodal sentiment analysis, multi-scale feature extraction, multi-task learning, multimodal data fusion
Discipline
Graphics and Human Computer Interfaces | Numerical Analysis and Scientific Computing
Publication
IEEE/ACM Transactions on Audio, Speech and Language Processing
Volume
32
First Page
1416
Last Page
1429
ISSN
2329-9290
Identifier
10.1109/TASLP.2024.3361374
Publisher
Association for Computing Machinery (ACM)
Citation
LIN, Changkai; CHENG, Hongju; RAO, Qiang; and YANG, Yang.
M3SA: Multimodal Sentiment Analysis based on multi-scale feature extraction and multi-task learning. (2024). IEEE/ACM Transactions on Audio, Speech and Language Processing. 32, 1416-1429.
Available at: https://ink.library.smu.edu.sg/sis_research/8755
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TASLP.2024.3361374
Included in
Graphics and Human Computer Interfaces Commons, Numerical Analysis and Scientific Computing Commons