Mastering Python for Data Science

Book Image

Mastering Python for Data Science

By : Samir Madhavan

Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Mastering Python for Data Science

Mastering Python for Data Science

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Getting Started with Raw Data

Getting Started with Raw Data

The world of arrays with NumPy

Empowering data analysis with pandas

Data operations

Inferential Statistics

Inferential Statistics

Various forms of distribution

One-tailed and two-tailed tests

Type 1 and Type 2 errors

A confidence interval

Z-test vs T-test

The F distribution

The chi-square distribution

The chi-square test of independence

Finding a Needle in a Haystack

Finding a Needle in a Haystack

What is data mining?

Presenting an analysis

Studying the Titanic

Making Sense of Data through Advanced Visualization

Making Sense of Data through Advanced Visualization

Controlling the line properties of a chart

Creating multiple plots

Playing with text

Styling your plots

Scatter plots with histograms

A scatter plot matrix

Hexagon bin plots

A 3D plot of a surface

Uncovering Machine Learning

Uncovering Machine Learning

Different types of machine learning

Linear regression

Logistic regression

The naive Bayes classifier

The k-means clustering

Hierarchical clustering

Performing Predictions with a Linear Regression

Performing Predictions with a Linear Regression

Simple linear regression

Multiple regression

Training and testing a model

Estimating the Likelihood of Events

Estimating the Likelihood of Events

Logistic regression

Generating Recommendations with Collaborative Filtering

Generating Recommendations with Collaborative Filtering

Recommendation data

User-based collaborative filtering

Item-based collaborative filtering

Pushing Boundaries with Ensemble Models

Pushing Boundaries with Ensemble Models

The census income dataset

Applying Segmentation with k-means Clustering

Applying Segmentation with k-means Clustering

The k-means algorithm and its working

The k-means clustering with countries

Clustering the countries

Analyzing Unstructured Data with Text Mining

Analyzing Unstructured Data with Text Mining

Preprocessing data

Creating a wordcloud

Word and sentence tokenization

Parts of speech tagging

Stemming and lemmatization

The Stanford Named Entity Recognizer

Performing sentiment analysis on world leaders using Twitter

Leveraging Python in the World of Big Data

Leveraging Python in the World of Big Data

What is Hadoop?

Python MapReduce

File handling with Hadoopy

Python with Apache Spark

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Chapter 10. Applying Segmentation with k-means Clustering

Clustering comes under unsupervised learning and helps in segmenting an instance into groups in such a way that instances in the group have similar characteristics. Amazon might want to understand who their high-value, medium-value and low-value users are. In the simplest form, we can determine this by bucketing the total transaction amount of each user into three buckets. The high value customers will come under the top 20 percentile bucket, the medium value will come under the 20th to 80th percentile bucket, and the bottom 20 percentile will contain the low-value customers. Amazon will know who their high value customers are through this and ensure that they are taken care of in case of scenarios, such as payment failures for transactions. Here, we've used a single variable, such as the transaction amount, and we've manually bucketed the data.

We require an algorithm that can take multiple variables and helps us in bucketing instances...