Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

The k-means algorithm and its working


The k-means clustering algorithm operates by computing the average of features, such as the variables that we use for clustering. For example, segmenting customers based on the average transaction amount and the average number of products purchased in a quarter of a year. This mean then becomes the center of a cluster. The K number is the number of clusters, that is, the technique consists of computing a K number of means that lead to the clustering of data around these k-means.

How do we choose this K? If we have some idea of what we are looking for or how many clusters we expect or want, then we can set K to be this number before we start the engines and let the algorithm compute along.

If we don't know how many clusters there are, then our exploration will take a little longer and involve some trial and error, say, as we try K=3,4, and 5.

The k-means algorithm is iterative. It starts by choosing K points at random from the data and uses these as cluster...