Publication Type

Conference Proceeding Article

Version

submittedVersion

Publication Date

12-2022

Abstract

We teach K-Means clustering in introductory data analytics courses because it is one of the simplest and most widely used unsupervised machine learning algorithms. However, one drawback of this algorithm is that it does not offer a clear method to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. What is usually taught as the solution for the K Selection problem is the so-called elbow method, where we look at the incremental changes in some quality metric (usually, the sum of squared errors, SSE), trying to find a sudden change. In addition to SSE, we can find many other metrics and methods in the literature. In this paper, we survey several of them, and conclude that the Variance Ratio Criterion (VRC) is an appropriate metric we should consider teaching for K Selection. From a pedagogical perspective, VRC has desirable mathematical properties, which help emphasize the statistical underpinnings of the algorithm, thereby reinforcing the students’ understanding through experiential learning. We also list the key concepts targeted by the VRC approach and provide ideas for assignments.

Keywords

K-Means Clustering, Quality Metrics, K Selection, Variance Ratio Criterion

Discipline

Higher Education | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

2022 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE): Hong Kong, December 4-7: Proceedings

First Page

Last Page

ISBN

9781665491174

Identifier

10.1109/TALE54877.2022.00016

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

THULASIDAS, Manoj. A recommendation on how to teach K-means in introductory analytics courses. (2022). 2022 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE): Hong Kong, December 4-7: Proceedings. 46-53.
Available at: https://ink.library.smu.edu.sg/sis_research/7679

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TALE54877.2022.00016

Download

Included in

Higher Education Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

A recommendation on how to teach K-means in introductory analytics courses

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

A recommendation on how to teach K-means in introductory analytics courses

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links