Research Collection School Of Computing and Information Systems

On strategies for imbalanced text classification using SVM: A comparative study

Publication Type

Journal Article

Version

publishedVersion

Publication Date

12-2009

Abstract

Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision–Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies.

Keywords

Imbalanced text classification, Support Vector Machines, SVM, Resampling, Instance weighting

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

Decision Support Systems

Volume

Issue

First Page

191

Last Page

201

ISSN

0167-9236

Identifier

10.1016/j.dss.2009.07.011

Publisher

Elsevier

Citation

SUN, Aixin; LIM, Ee Peng; and LIU, Ying. On strategies for imbalanced text classification using SVM: A comparative study. (2009). Decision Support Systems. 48, (1), 191-201.
Available at: https://ink.library.smu.edu.sg/sis_research/757

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1016/j.dss.2009.07.011

Download

Find it in your library

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

On strategies for imbalanced text classification using SVM: A comparative study

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

On strategies for imbalanced text classification using SVM: A comparative study

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links