Publication Type

Journal Article

Version

acceptedVersion

Publication Date

10-2025

Abstract

Automated hate speech detection is an important tool in combating the spread of hate speech, particularly in social media. Numerous methods have been developed for the task, including a recent proliferation of deep-learning based approaches. A variety of datasets have also been developed, exemplifying various manifestations of the hate-speech detection problem. We present here a largescale empirical comparison of deep and shallow hate-speech detection methods, mediated through the three most commonly used datasets. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. We particularly focus our analysis on measures of practical performance, including detection accuracy, computational efficiency, capability in using pre-trained models, and domain generalization. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.

Keywords

Hate speech detection, Natural language processing, Deep learning, Machine learning, Transformers

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

International Journal of Data Science and Analytics

Volume

20

First Page

3055

Last Page

3068

ISSN

2364-415X

Identifier

10.1007/s41060-024-00650-6

Publisher

Springer

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/s41060-024-00650-6

Share

COinS