Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
11-2001
Abstract
Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose atop-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar or not far from the correct ones in the category tree. We therefore propose the Category-Similarity Measures and Distance-Based Measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance four proposed hierarchical classification method. The results showed that our method performs well for Reuters text collection when enough training documents are given andthe new measures have indeed considered the contributions of misclassified documents.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
IEEE International Conference on Data Mining, 29 November-2 December 2001, San Jose, California: Proceedings
First Page
521
Last Page
528
ISBN
9780769511191
Identifier
10.1109/ICDM.2001.989560
Publisher
IEEE
City or Country
San Jose, CA, USA
Citation
SUN, Aixin and LIM, Ee Peng.
Hierarchical text classification and evaluation. (2001). IEEE International Conference on Data Mining, 29 November-2 December 2001, San Jose, California: Proceedings. 521-528.
Available at: https://ink.library.smu.edu.sg/sis_research/976
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1109/ICDM.2001.989560
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons