Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
12-2020
Abstract
In order to guide our students of machine learning in their statistical thinking, we need conceptually simple and mathematically defensible algorithms. In this paper, we present the Nearest Centroid algorithm (NC) algorithm as a pedagogical tool, combining the key concepts behind two foundational algorithms: K-Means clustering and K Nearest Neighbors (k- NN). In NC, we use the centroid (as defined in the K-Means algorithm) of the observations belonging to each class in our training data set and its distance from a new observation (similar to k-NN) for class prediction. Using this obvious extension, we will illustrate how the concepts of probability and statistics are applied in machine learning algorithms. Furthermore, we will describe how the practical aspects of validation and performance measurements are carried out. The algorithm and the work presented here can be easily converted to labs and reading assignments to cement the students' understanding of applied statistics and its connection to machine learning algorithms, as described toward the end of this paper.
Keywords
statistical thinking, applied statistics, machine learning, nearest centroid, k-means clustering, k nearest neighbor
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering, TALE 2020: Virtual, December 8-11: Proceedings
First Page
9
Last Page
16
ISBN
9781728169422
Identifier
10.1109/TALE48869.2020.9368396
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
THULASIDAS, Manoj.
Nearest Centroid: A bridge between statistics and machine learning. (2020). 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering, TALE 2020: Virtual, December 8-11: Proceedings. 9-16.
Available at: https://ink.library.smu.edu.sg/sis_research/5555
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TALE48869.2020.9368396