Publication Type
Journal Article
Version
publishedVersion
Publication Date
5-2003
Abstract
This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluated their learning capabilities and efficiency in mining text classification knowledge. Benchmark experiments showed that their predictive performance were roughly comparable, especially on clean and well organized data sets. While kNN and ARAM yield better performances than SVM on small and clean data sets, SVM and ARAM significantly outperformed kNN on noisy data. Comparing efficiency, kNN was notably more costly in terms of time and memory than the other two methods. SVM is highly efficient in learning from well organized samples of moderate size, although on relatively large and noisy data the efficiency of SVM and ARAM are comparable.
Keywords
text categorization, machine learning, comparative experiments
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems | Software Engineering
Research Areas
Data Science and Engineering
Publication
Applied Intelligence
Volume
18
Issue
3
First Page
311
Last Page
322
ISSN
0924-669X
Identifier
10.1023%2FA%3A1023202221875
Publisher
Springer (part of Springer Nature): Springer Open Choice Hybrid Journals
Citation
HE, Ji; TAN, Ah-hwee; and TAN, Chew-Lim.
On machine learning methods for Chinese document classification. (2003). Applied Intelligence. 18, (3), 311-322.
Available at: https://ink.library.smu.edu.sg/sis_research/5243
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1023%2FA%3A1023202221875
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Software Engineering Commons