10.2 INTRODUCTION TO THE k‐MEANS CLUSTERING ALGORITHM
There are many different clustering methods, including hierarchical clustering, Kohonen networks clustering, and BIRCH clustering.1 Here, we shall focus on the k‐means clustering algorithm. The k‐means clustering algorithm2 is a straightforward and effective algorithm for finding clusters in data. The algorithm proceeds as follows:
- Step 1: Ask the user how many clusters k the data set should be partitioned into.
- Step 2: Randomly assign k records to be the initial cluster center locations.
- Step 3: For each record, find the nearest cluster center. Thus, in a sense, each cluster center “owns” a subset of the records, thereby representing a partition of the data set. We therefore have k clusters, C1, C2, …, Ck.
- Step 4: For each of the k clusters, find the cluster centroid and update the location of each cluster center to the new value of the centroid.
- Step 5: Repeat steps 3–5 until convergence...