Publication Type
Conference Proceeding Article
Version
submittedVersion
Publication Date
11-2009
Abstract
In this paper, we try to predict which category will be less accurately classified compared with other categories in a classification task that involves multiple categories. The categories with poor predicted performance will be identified before any classifiers are trained and additional steps can be taken to address the predicted poor accuracies of these categories. Inspired by the work on query performance prediction in ad-hoc retrieval, we propose to predict classification performance using two measures, namely, category size and category coherence. Our experiments on 20-Newsgroup and Reuters-21578 datasets show that the Spearman rank correlation coefficient between the predicted rank of classification performance and the expected classification accuracy is as high as 0.9.
Keywords
Classification performance prediction, Text classification
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
ACM Conference on Information and Knowledge Management (CIKM)
First Page
1891
Last Page
1894
ISBN
9781605585123
Identifier
10.1145/1645953.1646258
Publisher
ACM
Citation
SUN, Aixin; LIM, Ee Peng; and LIU, Ying.
What makes categories difficult to classify?. (2009). ACM Conference on Information and Knowledge Management (CIKM). 1891-1894.
Available at: https://ink.library.smu.edu.sg/sis_research/488
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1145/1645953.1646258
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons