Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2021

Abstract

In this paper, we revisit the decades-old clustering method k-means. The egg-chicken loop in traditional k-means has been replaced by a pure stochastic optimization procedure. The optimization is undertaken from the perspective of each individual sample. Different from existing incremental k-means, an individual sample is tentatively joined into a new cluster to evaluate its distance to the corresponding new centroid, in which the contribution from this sample is accounted. The sample is moved to this new cluster concretely only after we find the reallocation makes the sample closer to the new centroid than it is to the current one. Compared with traditional k-means and other variants, this new procedure allows the clustering to converge faster to a better local minimum. This fundamental modification over the k-means loop leads to the redefinition of a family of k-means variants, such as hierarchical k-means, and Sequential k-means. As an extension, a new target function that minimizes the summation of pairwise distances within clusters is presented. Under l2-norm, it could be solved under the same stochastic optimization procedure. The re-defined traditional k-means, hierarchical k-means, as well as Sequential kmeans all show considerable performance improvement over their traditional counterparts under different settings and on various types of datasets

Keywords

Driven function, k-means, Stochastic optimization

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Conference, November 1-5

First Page

2679

Last Page

2687

ISBN

9781450384469

Identifier

10.1145/3459637.3482359

Publisher

ACM

City or Country

New York

Citation

ZHAO Wan-Lei; LAN, Shi Ying; CHEN, Run-Qing; and NGO, Chong-wah. K-sums clustering: A stochastic optimization approach. (2021). CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Conference, November 1-5. 2679-2687.
Available at: https://ink.library.smu.edu.sg/sis_research/6806

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3459637.3482359

Download

Find it in your library

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

K-sums clustering: A stochastic optimization approach

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

K-sums clustering: A stochastic optimization approach

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links