Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

The drawbacks of k-means


k-means is one of the most popular clustering algorithms due to its relative ease of implementation and the fact that it can be made to scale well to very large datasets. In spite of its popularity, there are several drawbacks.

k-means is stochastic, and does not guarantee to find the global optimum solution for clustering. In fact, the algorithm can be very sensitive to outliers and noisy data: the quality of the final clustering can be highly dependent on the position of the initial cluster centroids. In other words, k-means will regularly discover a local rather than global minimum.

The preceding diagram illustrates how k-means may converge to a local minimum based on poor initial cluster centroids. Non-optimal clustering may even occur if the initial cluster centroids are well-placed, since k-means prefers clusters with similar sizes and densities. Where clusters are not approximately equal in size and density, k-means may fail to converge to the most natural...