The k-means algorithm is relatively simple to implement, so in this chapter we'll write it from scratch. The algorithm requires only two pieces of information: the k in k-means (the number of clusters we wish to identify), and the data points to evaluate. There are additional parameters the algorithm can use, for example, the maximum number of iterations to allow, but they are not required. The only required output of the algorithm is k centroids, or a list of points that represent the centers of the clusters of data. If k = 3, then the algorithm must return three centroids as its output. The algorithm may also return other metrics, such as the total error, the total number of iterations required to reach steady state, and so on, but again these are optional.
A high-level description of the k-means algorithm is as follows:
- Given the parameter...