Rapid - Apache Mahout Clustering designs

As you cannot do engineering without math, in the same way, you cannot start a clustering discussion without K-means. This is one of the basic and most useful algorithms.

The name of the algorithm is K-means because by using this, we divide the set of data into K-different clusters. So, this algorithm puts a hard limitation on the number of clusters formed. K-means algorithms follow these steps:

The algorithm will start with the selection of the number of clusters—K.
It will initialize the K centroid points in the cluster.
Now, the closest points of each centroid are computed.
Next, the centroid location is recomputed for each cluster.
Steps 3 and 4 are repeated until the convergence is reached.

Convergence is reached when the location of centroids does not move from one iteration to the next. In an algorithm, we also provide a convergence threshold, which indicates that the centroid does not move more than this distance, and if it is reached, we stop the algorithm.

The K-means...

Rapid - Apache Mahout Clustering designs

Rapid - Apache Mahout Clustering designs

Overview of this book

Related Content you might be interested in

Current Title:

Rapid - Apache Mahout Clustering designs

Learning K-means