-
Book Overview & Buying
-
Table Of Contents
scikit-learn Cookbook - Third Edition
By :
K-means is a centroid-based clustering algorithm that partitions data into a predefined number of clusters, which is perfect considering our data is quite blobby from the Introduction to Clustering section. First, K-means randomly creates centroids in our feature space. Next, it iteratively assigns each data point to the nearest cluster centroid and then recalculates the cluster centroids and moves them in the feature space so that they are positioned approximately within the average distance among the data points current assigned to them in the current iteration. This process continues until convergence where the centroids don’t move much and data points are not being reassigned to other cluster centroid. K-means is efficient and works best when clusters are convex, isotropic, and roughly equal in size…which also can be its greatest weakness. This recipe will walk you through this process.
Here, we’ll use the previous dummy data...