12. Leveraging Python in the World of Big Data | Mastering Python for Data Science

Book Overview & Buying
Table Of Contents

Mastering Python for Data Science

By : Samir Madhavan

3.6 (10)

Buy this Book

Mastering Python for Data Science

3.6 (10)

By: Samir Madhavan

Buy this Book

Overview of this book

Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving. This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science. Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods. Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Getting Started with Raw Data

The world of arrays with NumPy

Empowering data analysis with pandas

Data cleansing

Data operations

Summary

2. Inferential Statistics

Various forms of distribution

A z-score

A p-value

One-tailed and two-tailed tests

Type 1 and Type 2 errors

A confidence interval

Correlation

Z-test vs T-test

The F distribution

The chi-square distribution

The chi-square test of independence

ANOVA

Summary

3. Finding a Needle in a Haystack

What is data mining?

Presenting an analysis

Studying the Titanic

Summary

4. Making Sense of Data through Advanced Visualization

Controlling the line properties of a chart

Creating multiple plots

Playing with text

Styling your plots

Box plots

Heatmaps

Scatter plots with histograms

A scatter plot matrix

Area plots

Bubble charts

Hexagon bin plots

Trellis plots

A 3D plot of a surface

Summary

5. Uncovering Machine Learning

Different types of machine learning

Decision trees

Linear regression

Logistic regression

The naive Bayes classifier

The k-means clustering

Hierarchical clustering

Summary

6. Performing Predictions with a Linear Regression

Simple linear regression

Multiple regression

Training and testing a model

Summary

7. Estimating the Likelihood of Events

Logistic regression

Summary

8. Generating Recommendations with Collaborative Filtering

Recommendation data

User-based collaborative filtering

Item-based collaborative filtering

Summary

9. Pushing Boundaries with Ensemble Models

The census income dataset

Decision trees

Random forests

Summary

10. Applying Segmentation with k-means Clustering

The k-means algorithm and its working

The k-means clustering with countries

Clustering the countries

Summary

11. Analyzing Unstructured Data with Text Mining

Preprocessing data

Creating a wordcloud

Word and sentence tokenization

Parts of speech tagging

Stemming and lemmatization

The Stanford Named Entity Recognizer

Performing sentiment analysis on world leaders using Twitter

Summary

12. Leveraging Python in the World of Big Data

What is Hadoop?

Python MapReduce

File handling with Hadoopy

Pig

Python with Apache Spark

Summary

Index

Mastering Python for Data Science

By : Samir Madhavan

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Python with Apache Spark

Scoring the sentiment

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access