A clustering problem consists in the selection and grouping of homogeneous items from a set of initial data. To solve this problem, we must:
Identify a resemblance measure between elements
Find out if there are subsets of elements that are similar to the measure chosen
The algorithm determines which elements form a cluster and what degree of similarity unites them within the cluster.
The clustering algorithms fall into the unsupervised methods, because we do not assume any prior information on the structures and characteristics of the clusters.
One of the most common and simple clustering algorithms is k-means, which allows subdividing groups of objects into k partitions on the basis of their attributes. Each cluster is identified by a point or centroid average.
The algorithm follows an iterative procedure:
Randomly select K points as the initial centroids.
Repeat.
Form K clusters by assigning all points to the closest centroid.
Recompute the centroid of each cluster...