Book Image

Microsoft Azure Machine Learning

By : Sumit Mund, Christina Storm
Book Image

Microsoft Azure Machine Learning

By: Sumit Mund, Christina Storm

Overview of this book

Table of Contents (21 chapters)
Microsoft Azure Machine Learning
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding the K-means clustering algorithm


The K-means clustering algorithm is the most popular clustering algorithm. It is simple and powerful. As the name suggests, the algorithm creates K clusters out of the dataset where K is a number you decide. For simplicity, let's consider a dataset with two features and let's plot them on a two dimensional space as one feature on x axis and the other on y axis. Again, note that as clustering is an unsupervised learning problem, no label, class, or dependent variable is required.

With the K-means algorithm, K centroids are determined for K clusters. All the points in a cluster are closest to its centroid than to any other centroids.

Consider K =3, where there are 3 clusters and hence 3 centroids, as you can find in the following figure. So by intuition, take any point and calculate its distance from the three centroids. The point will belong to the cluster whose centroid is the nearest.

For a point, let d1 be the distance from Centroid 1, d2 be...