The K-means algorithm is also referred to as vector quantization. What the algorithm does is finds the cluster (centroid) positions that minimize the distances to all points in the cluster. This is done iteratively; the problem with the algorithm is that it can be a bit greedy, meaning that it will find the nearest minima quickly. This is generally solved with some kind of basin-hopping approach where the nearest minima found is randomly perturbed and the algorithm restarted. Due to this fact, the algorithm is dependent on good initial guesses as input.
As mentioned in Chapter 4 , Regression, we will analyze the data of suicide rates versus GDP versus absolute latitude or Degrees From Equator (DFE) for clusters. Our hypothesis from the visual inspection was that there were at least two distinct clusters, one with a higher suicide rate, GDP, and absolute latitude, and one with lower. We saved an HDF file in Chapter 4 ...