Publication Type
Journal Article
Version
publishedVersion
Publication Date
4-2016
Abstract
entiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures. This is an extended version of a paper with the title “Tweet Sentiment: From Classification to Quantification” which appears in the Proceedings of the 6th ACM/IEEE International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2015).
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Social Network Analysis and Mining
Volume
6
Issue
1
First Page
1
Last Page
31
ISSN
1869-5450
Identifier
10.1007/s13278-016-0327-z
Publisher
Springer Verlag (Germany)
Citation
GAO, Wei and SEBASTIANI, Fabrizio.
From classification to quantification in tweet sentiment analysis. (2016). Social Network Analysis and Mining. 6, (1), 1-31.
Available at: https://ink.library.smu.edu.sg/sis_research/4547
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s13278-016-0327-z