Silhouette information is a measurement to validate a cluster of data. In the previous recipe, we mentioned that the measurement of a cluster involves the calculation of how closely the data is clustered within each cluster, and measuring how far different clusters are apart from each other. The silhouette coefficient combines the measurement of intra-cluster distance and inter-cluster distance. The output value typically ranges from 0
to 1
; the closer to 1
, the better the cluster is. In this recipe, we will introduce how to compute silhouette information.
In order to extract silhouette information from a cluster, one needs to have completed the previous recipe by generating the hotel location dataset.