So far, we built different clustering algorithms but didn't measure their performances. In supervised learning, we just compare the predicted values with the original labels to compute their accuracy. In unsupervised learning, we don't have any labels. Therefore, we need a way to measure the performance of our algorithms.
A good way to measure a clustering algorithm is by seeing how well the clusters are separated. Are the clusters well separated? Are the datapoints in a cluster tight enough? We need a metric that can quantify this behavior. We will use a metric, called Silhouette Coefficient score. This score is defined for each datapoint. This coefficient is defined as follows:
score = (x – y) / max(x, y)
Here, x
is the average distance between the current datapoint and all the other datapoints in the same cluster; y
is the average distance between the current datapoint and all the datapoints in the next nearest cluster.