The k-means clustering algorithm is a clustering algorithm that groups a set of items not previously classified into a predefined number of clusters, K. It's very popular within the data mining and machine learning world, and is used in these fields to organize and classify data in an unsupervised way.
Each item is normally defined by a vector of characteristics or attributes (we use vector as a math concept, not as a data structure). All the items have the same number of attributes. Each cluster is also defined by a vector with the same number of attributes that represent all the items classified into that cluster. This vector is named the
centroid. For example, if the items are defined by numeric vectors, the clusters are defined by the mean of the items classified into that cluster.
Basically, the algorithm has four steps:
- Initialization: In the first step, you have to create the initial vectors that represent the K clusters. Normally...