Choosing the number of clusters
If you don't know in advance how many clusters you have, then how do you choose the optimal k? This is essentially an egg-and-chicken problem. Several approaches are popular and we'll discuss one of them: the elbow method.
Do you remember those mysterious WCSS that we calculated on every iteration of k-means? This measure tells us how much points in every cluster are different from their centroid. We can calculate it for several different k values and plot the result. It usually looks somewhat similar to the plot on the following graph:
Figure 4.3: WCSS plotted against the number of clusters
This plot should remind you about the similar plots of loss functions from Chapter 3, K-Nearest Neighbors Classifier. It shows how well our model fits the data. The idea of the elbow method is to choose the k value after which the result is not going to improve sharply anymore. The name comes from the similarity of the plot to an arm. We choose the point at the elbow, marked...