Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
1-2026
Abstract
In today's world, emotional support is increasingly essential, yet it remains challenging for both those seeking help and those offering it. Multimodal approaches to emotional support show great promise by integrating diverse data sources to provide empathetic, contextually relevant responses, fostering more effective interactions. However, current methods have notable limitations, often relying solely on text or converting other data types into text, or providing emotion recognition only, thus overlooking the full potential of multimodal inputs. Moreover, many studies prioritize response generation without accurately identifying critical emotional support elements or ensuring the reliability of outputs. To overcome these issues, we introduce \textsc{ MultiMood}, a new framework that (i) leverages multimodal embeddings from video, audio, and text to predict emotional components and to produce responses responses aligned with professional therapeutic standards. To improve trustworthiness, we (ii) incorporate novel psychological criteria and apply Reinforcement Learning (RL) to optimize large language models (LLMs) for consistent adherence to these standards. We also (iii) analyze several advanced LLMs to assess their multimodal emotional support capabilities. Experimental results show that MultiMood achieves state-of-the-art on MESC and DFEW datasets while RL-driven trustworthiness improvements are validated through human and LLM evaluations, demonstrating its superior capability in applying a multimodal framework in this domain.
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Areas of Excellence
Digital transformation
Publication
Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI‑26), Singapore, January 20-27
First Page
1
Last Page
18
Identifier
10.48550/arXiv.2511.10011
City or Country
Singapore
Citation
LE, Huy M.; NGUYEN, Dat Tien; VO, Ngan T. T.; NGUYEN, Tuan D. Q.; BINH, Nguyen Le; NGUYEN, Duy Minh Ho; SONNTAG, Daniel; Lizi LIAO; and NGUYEN, Binh T..
Reinforce trustworthiness in multimodal emotional support system. (2026). Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI‑26), Singapore, January 20-27. 1-18.
Available at: https://ink.library.smu.edu.sg/sis_research/10750
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.48550/arXiv.2511.10011