Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

Chapter 10. Applying Segmentation with k-means Clustering

Clustering comes under unsupervised learning and helps in segmenting an instance into groups in such a way that instances in the group have similar characteristics. Amazon might want to understand who their high-value, medium-value and low-value users are. In the simplest form, we can determine this by bucketing the total transaction amount of each user into three buckets. The high value customers will come under the top 20 percentile bucket, the medium value will come under the 20th to 80th percentile bucket, and the bottom 20 percentile will contain the low-value customers. Amazon will know who their high value customers are through this and ensure that they are taken care of in case of scenarios, such as payment failures for transactions. Here, we've used a single variable, such as the transaction amount, and we've manually bucketed the data.

We require an algorithm that can take multiple variables and helps us in bucketing instances...