Publication Type

Journal Article

Version

publishedVersion

Publication Date

9-2025

Abstract

Large language models (LLMs) have transformed sentiment analysis, yet balancing accuracy, efficiency, and explainability remains a critical challenge. This study presents the first comprehensive evaluation of DeepSeek-R1—an open-source reasoning model—against OpenAI’s GPT-4o and GPT-4o-mini. We test the full 671B model and its distilled variants, systematically documenting few-shot learning curves. Our experiments show DeepSeek-R1 achieves a 91.39% F1 score on 5-class sentiment and 99.31% accuracy on binary tasks with just 5 shots, an eightfold improvement in few-shot efficiency over GPT-4o. Architecture-specific distillation effects emerge, where a 32B Qwen2.5-based model outperforms the 70B Llama-based variant by 6.69 percentage points. While its reasoning process reduces throughput, DeepSeek-R1 offers superior explainability via transparent, step-by-step traces, establishing it as a powerful, interpretable open-source alternative.

Keywords

Sentiment Analysis, Large Language Models, Explainability, DeepSeek-R1, GPT-4o, Few-Shot Learning, Emotion Recognition

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

IEEE Intelligent Systems

First Page

1

Last Page

10

ISSN

1541-1672

Identifier

10.1109/MIS.2025.3614967

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/MIS.2025.3614967

Share

COinS