Publication Type
Journal Article
Version
acceptedVersion
Publication Date
2-2022
Abstract
Text classification is a fundamental task in content analysis. Nowadays, deep learning has demonstrated promising performance in text classification compared with shallow models. However, almost all the existing models do not take advantage of the wisdom of human beings to help text classification. Human beings are more intelligent and capable than machine learning models in terms of understanding and capturing the implicit semantic information from text. In this article, we try to take guidance from human beings to classify text. We propose Crowd-powered learning for Text Classification (CrowdTC for short). We design and post the questions on a crowdsourcing platform to extract keywords in text. Sampling and clustering techniques are utilized to reduce the cost of crowdsourcing. Also, we present an attention-based neural network and a hybrid neural network to incorporate the extracted keywords as human guidance into deep neural networks. Extensive experiments on public datasets confirm that CrowdTC improves the text classification accuracy of neural networks by using the crowd-powered keyword guidance.
Keywords
Text classification, crowdsourcing, keyword extraction, neural networks
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
ACM Transactions on Knowledge Discovery from Data
Volume
16
Issue
1
First Page
15:1
Last Page
15:23
ISSN
1556-4681
Identifier
10.1145/3457216
Publisher
Association for Computing Machinery (ACM)
Citation
YANG, Keyu; Gao, Yunjun; LIANG, Lei; BIAN, Song; CHEN, Lu; and ZHENG, Baihua.
CrowdTC: Crowd-powered learning for text classification. (2022). ACM Transactions on Knowledge Discovery from Data. 16, (1), 15:1-15:23.
Available at: https://ink.library.smu.edu.sg/sis_research/7149
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3457216
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons