Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2006

Abstract

Techniques for find document clusters mostly depend on models that impose strong explicit and/or implicit priori assumptions. As a consequence, the clustering effects tend to be unnatural and stray away from the intrinsic grouping natures of a document collection. We apply a novel graph-theoretic technique called Clique Percolation Method (CPM) for document clustering. In this method, a process of enumerating highly cohesive maximal document cliques is performed in a random graph, where those strongly adjacent cliques are mingled to form naturally overlapping clusters. Our clustering results can unveil the inherent structural connections of the underlying data. Experiments show that CPM can outperform some typical algorithms on benchmark data sets, and shed light on its advantages on natural document clustering.

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 21st International Conference on Computer Processing of Oriental Languages (ICCPOL 2006)

First Page

97

Last Page

108

Identifier

10.1007/11940098_10

Publisher

LNAI, Springer

City or Country

Singapore

Additional URL

https://doi.org/10.1007/11940098_10

Share

COinS