There are many different forms of clustering models available, ranging from simple to extremely complex ones. The Spark MLlib currently provides k-means clustering, which is among the simplest approaches available. However, it is often very effective, and its simplicity means it is relatively easy to understand and is scalable.
Types of clustering models
k-means clustering
k-means attempts to partition a set of data points into K distinct clusters (where K is an input parameter for the model).
More formally, k-means tries to find clusters so as to minimize the sum of squared errors (or distances) within each cluster. This objective function is known as the within cluster sum of squared errors (WCSS).
It is the sum, over each cluster, of the squared errors between...