Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

The k-means clustering


The k-means clustering is an unsupervised learning technique that helps in partitioning data of n observations into K buckets of similar observations.

The clustering algorithm is called so because it operates by computing the mean of the features which refer to the dependent variables based on which we cluster things, such as segmenting of customers based on an average transaction amount and the average number of products purchased in a quarter of a year. This mean value then becomes the center of a cluster. The number K refers to the number of clusters, that is, the technique consisting of computing a K number of means, leading to the clustering of the data around these k-means.

How do we choose this K? If we have some idea of what we are looking for or how many clusters we expect or want, then we set K to be this number before we start the engines and let the algorithm compute along.

If we don't know how many there are, then our exploration will take a little longer...