Publication Type

Journal Article

Version

publishedVersion

Publication Date

8-2025

Abstract

Ensuring the safety and well-being of children is increasingly important, especially in a world where visual content is pervasive. This paper proposes a novel multimodal, multilingual, and multiclass sentiment analysis method for social media content, aimed at improving content moderation for child safety. Our approach integrates textual, visual, and audio data from videos, categorizing sentiment into four levels: positive, slightly negative, negative, and strongly negative, enabling granular detection of harmful content. To enhance explainability and trust, we also leverage interpretable mechanisms to analyze the contributions of each modality. Evaluation of our method demonstrates strong generalization across diverse video types, and demonstrated that most misclassifications arise from annotation inconsistencies or ambiguities, highlighting the model’s reliability in real-world scenarios Notably, when compared to lightweight Multimodal Language Models, our method achieves higher accuracy and robustness. Overall, the evaluation confirms its effectiveness for video sentiment analysis and content moderation, particularly for child safety.

Keywords

Sentiment Analysis, Multimodal Learning, Child Safety, Deep Learning, Explainable Sentiment Analysis, Video sentiment analysis, Multimodal Language Models, Multiclass

Discipline

Artificial Intelligence and Robotics | Social Media

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

IEEE Intelligent Systems

Volume

40

Issue

4

First Page

64

Last Page

72

ISSN

1541-1672

Identifier

10.1109/MIS.2025.3586158

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/MIS.2025.3586158

Share

COinS