The k-means clustering algorithm operates by computing the average of features, such as the variables that we use for clustering. For example, segmenting customers based on the average transaction amount and the average number of products purchased in a quarter of a year. This mean then becomes the center of a cluster. The K number is the number of clusters, that is, the technique consists of computing a K number of means that lead to the clustering of data around these k-means.
How do we choose this K? If we have some idea of what we are looking for or how many clusters we expect or want, then we can set K to be this number before we start the engines and let the algorithm compute along.
If we don't know how many clusters there are, then our exploration will take a little longer and involve some trial and error, say, as we try K=3,4, and 5.
The k-means algorithm is iterative. It starts by choosing K points at random from the data and uses these as cluster...