Publication Type
Conference Proceeding Article
Version
submittedVersion
Publication Date
12-2022
Abstract
We teach K-Means clustering in introductory data analytics courses because it is one of the simplest and most widely used unsupervised machine learning algorithms. However, one drawback of this algorithm is that it does not offer a clear method to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. What is usually taught as the solution for the K Selection problem is the so-called elbow method, where we look at the incremental changes in some quality metric (usually, the sum of squared errors, SSE), trying to find a sudden change. In addition to SSE, we can find many other metrics and methods in the literature. In this paper, we survey several of them, and conclude that the Variance Ratio Criterion (VRC) is an appropriate metric we should consider teaching for K Selection. From a pedagogical perspective, VRC has desirable mathematical properties, which help emphasize the statistical underpinnings of the algorithm, thereby reinforcing the students’ understanding through experiential learning. We also list the key concepts targeted by the VRC approach and provide ideas for assignments.
Keywords
K-Means Clustering, Quality Metrics, K Selection, Variance Ratio Criterion
Discipline
Databases and Information Systems | Higher Education
Research Areas
Data Science and Engineering
Publication
2022 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE): Hong Kong, December 4-7: Proceedings
First Page
46
Last Page
53
ISBN
9781665491174
Identifier
10.1109/TALE54877.2022.00016
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
THULASIDAS, Manoj.
A recommendation on how to teach K-means in introductory analytics courses. (2022). 2022 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE): Hong Kong, December 4-7: Proceedings. 46-53.
Available at: https://ink.library.smu.edu.sg/sis_research/7679
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TALE54877.2022.00016