Publication Type

Conference Proceeding Article

Version

submittedVersion

Publication Date

11-2009

Abstract

In this paper, we try to predict which category will be less accurately classified compared with other categories in a classification task that involves multiple categories. The categories with poor predicted performance will be identified before any classifiers are trained and additional steps can be taken to address the predicted poor accuracies of these categories. Inspired by the work on query performance prediction in ad-hoc retrieval, we propose to predict classification performance using two measures, namely, category size and category coherence. Our experiments on 20-Newsgroup and Reuters-21578 datasets show that the Spearman rank correlation coefficient between the predicted rank of classification performance and the expected classification accuracy is as high as 0.9.

Keywords

Classification performance prediction, Text classification

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

ACM Conference on Information and Knowledge Management (CIKM)

First Page

1891

Last Page

1894

ISBN

9781605585123

Identifier

10.1145/1645953.1646258

Publisher

ACM

Citation

SUN, Aixin; LIM, Ee Peng; and LIU, Ying. What makes categories difficult to classify?. (2009). ACM Conference on Information and Knowledge Management (CIKM). 1891-1894.
Available at: https://ink.library.smu.edu.sg/sis_research/488

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1145/1645953.1646258

Download

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

What makes categories difficult to classify?

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

What makes categories difficult to classify?

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links