Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2022

Abstract

K-Means clustering algorithm does not offer a clear methodology to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. In this paper, we present a new metric for clustering quality and describe its use for K selection. The proposed metric, based on the locations of the centroids, as well as the desired properties of the clusters, is developed in two stages. In the initial stage, we take into account the full covariance matrix of the clustering variables, thereby making it mathematically similar to a reduced chi2. We then extend it to account for how well the clustering results comply with the underlying assumptions of the K-Means algorithm (namely, balanced clusters in terms of variance and membership), and define our final metric (MC ). We demonstrate, using synthetic and real data sets, how well our metric performs in determining the right number of clusters to form. We also present detailed comparisons with existing quality indexes for automatic determination of the number of clusters.

Keywords

K-Means clustering, Quality metrics, K selection problem, Number of clusters

Discipline

Computer Engineering | Numerical Analysis and Scientific Computing | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

Advanced Data Mining and Applications: 18th International Conference, ADMA 2022, Brisbane, Australia, November 28-30: Proceedings

Volume

13726

First Page

208

Last Page

222

ISBN

9783031221361

Identifier

10.1007/978-3-031-22137-8_16

Publisher

Springer

City or Country

Cham

Citation

THULASIDAS, Manoj. A quality metric for K-Means clustering based on centroid locations. (2022). Advanced Data Mining and Applications: 18th International Conference, ADMA 2022, Brisbane, Australia, November 28-30: Proceedings. 13726, 208-222.
Available at: https://ink.library.smu.edu.sg/sis_research/7744

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-3-031-22137-8_16

Download

Find it in your library

Included in

Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

A quality metric for K-Means clustering based on centroid locations

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

A quality metric for K-Means clustering based on centroid locations

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links