Publication Type
Journal Article
Version
acceptedVersion
Publication Date
10-2004
Abstract
One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. We propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, threshold reduction, restricted voting, and extended multiplicative. Our experiments using support vector machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.
Keywords
Data mining, text mining, classification
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Knowledge and Data Engineering
Volume
16
Issue
10
First Page
1305
Last Page
1308
ISSN
1041-4347
Identifier
10.1109/TKDE.2004.50
Publisher
IEEE
Citation
LIM, Ee Peng; SUN, Aixin; NG, Wee-Keong; and SRIVASTAVA, Jaideep.
Blocking reduction strategies in hierarchical text classification. (2004). IEEE Transactions on Knowledge and Data Engineering. 16, (10), 1305-1308.
Available at: https://ink.library.smu.edu.sg/sis_research/124
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TKDE.2004.50
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons