The k-means clustering algorithm
In this section, we will cover the k-means clustering algorithm in depth. The k-means is a partitional clustering algorithm.
Let the set of data points (or instances) be as follows:
D = {x1, x2, …, xn}, where
xi = (xi1, xi2, …, xir), is a vector in a real-valued space X ⊆ Rr, and r is the number of attributes in the data.
The k-means algorithm partitions the given data into k clusters with each cluster having a center called a centroid.
k is specified by the user.
Given k, the k-means algorithm works as follows:
Algorithm k-means (k, D)
Identify the k data points as the initial centroids (cluster centers).
Repeat step 1.
For each data point x ϵ D do.
Compute the distance from x to the centroid.
Assign x to the closest centroid (a centroid represents a cluster).
endfor
Re-compute the centroids using the current cluster memberships until the stopping criterion is met.
Convergence or stopping criteria for the k-means clustering
The following list describes the convergence...