Publication Type
Journal Article
Version
publishedVersion
Publication Date
2003
Abstract
Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naïve Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down level-based hierarchical classification method.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
Journal of the American Society for Information Science and Technology (JASIST)
Volume
54
Issue
11
First Page
1014
Last Page
1028
ISSN
1532-2882
Identifier
10.1002/asi.10298
Publisher
Wiley
Citation
LIM, Ee Peng; SUN, Aixin; and NG, Wee-Keong.
Performance measurement framework for hierarchical text classification. (2003). Journal of the American Society for Information Science and Technology (JASIST). 54, (11), 1014-1028.
Available at: https://ink.library.smu.edu.sg/sis_research/166
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1002/asi.10298
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons