Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2020

Abstract

In order to guide our students of machine learning in their statistical thinking, we need conceptually simple and mathematically defensible algorithms. In this paper, we present the Nearest Centroid algorithm (NC) algorithm as a pedagogical tool, combining the key concepts behind two foundational algorithms: K-Means clustering and K Nearest Neighbors (k- NN). In NC, we use the centroid (as defined in the K-Means algorithm) of the observations belonging to each class in our training data set and its distance from a new observation (similar to k-NN) for class prediction. Using this obvious extension, we will illustrate how the concepts of probability and statistics are applied in machine learning algorithms. Furthermore, we will describe how the practical aspects of validation and performance measurements are carried out. The algorithm and the work presented here can be easily converted to labs and reading assignments to cement the students' understanding of applied statistics and its connection to machine learning algorithms, as described toward the end of this paper.

Keywords

statistical thinking, applied statistics, machine learning, nearest centroid, k-means clustering, k nearest neighbor

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering, TALE 2020: Virtual, December 8-11: Proceedings

First Page

9

Last Page

16

ISBN

9781728169422

Identifier

10.1109/TALE48869.2020.9368396

Publisher

IEEE

City or Country

Piscataway, NJ

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/TALE48869.2020.9368396

Share

COinS