Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2004

Abstract

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

Knowledge Discovery in Databases: PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24: Proceedings

Volume

3202

First Page

301

Last Page

312

ISBN

9783540301165

Identifier

10.1007/978-3-540-30116-5_29

Publisher

Springer Verlag

City or Country

Pisa, Italy

Citation

LI, Wenyuan; NG, Wee-Keong; ONG, Kok-Leong; and LIM, Ee Peng. A spectroscopy of texts for effective clustering. (2004). Knowledge Discovery in Databases: PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24: Proceedings. 3202, 301-312.
Available at: https://ink.library.smu.edu.sg/sis_research/1018

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1007/978-3-540-30116-5_29

Download

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

A spectroscopy of texts for effective clustering

Publication Type

Version

Publication Date

Abstract

Discipline

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

A spectroscopy of texts for effective clustering

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links