Clustering is an unsupervised machine learning type of analysis. Although we don't know in general what the best clusters are, we can still get an idea of how good the result of clustering is. One way is to calculate the silhouette coefficients as defined in the following equation:
In the preceding equation, a(i) is the average dissimilarity of sample i with respect to other samples in the same cluster. A small a(i) indicates that the sample belongs in its cluster. b(i) is the lowest average dissimilarity of i to other cluster. It indicates the next best cluster for i. If the silhouette coefficients s(i) of a sample is close to 1, it means that the sample is properly assigned. The value of s(i) varies between -1 to 1. The average of the silhouette coefficients of all samples measures the quality of the clusters.
We can use the mean silhouette coefficient to inform our decision for the number of clusters of the K-means clustering algorithm...