Publication Type

Journal Article

Version

publishedVersion

Publication Date

1-2012

Abstract

Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications. (C) 2011 Elsevier Ltd. All rights reserved.

Keywords

Text categorization, KNN text categorization, One-pass clustering, Spam filtering

Discipline

Databases and Information Systems | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

Expert Systems with Applications

Volume

Issue

First Page

1503

Last Page

1509

ISSN

0957-4174

Identifier

10.1016/j.eswa.2011.08.040

Publisher

Elsevier

Citation

JIANG, Shengyi; PANG, Guansong; WU, Meiling; and KUANG, Limin. An improved K-nearest-neighbor algorithm for text categorization. (2012). Expert Systems with Applications. 39, (1), 1503-1509.
Available at: https://ink.library.smu.edu.sg/sis_research/7542

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1016/j.eswa.2011.08.040

Download

Included in

Databases and Information Systems Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

An improved K-nearest-neighbor algorithm for text categorization

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

An improved K-nearest-neighbor algorithm for text categorization

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links