Numerical Computing with Python

Numerical Computing with Python

By : Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Buy this Book

Numerical Computing with Python

By: Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Buy this Book

Overview of this book

Data mining, or parsing the data to extract useful insights, is a niche skill that can transform your career as a data scientist Python is a flexible programming language that is equipped with a strong suite of libraries and toolkits, and gives you the perfect platform to sift through your data and mine the insights you seek. This Learning Path is designed to familiarize you with the Python libraries and the underlying statistics that you need to get comfortable with data mining. You will learn how to use Pandas, Python's popular library to analyze different kinds of data, and leverage the power of Matplotlib to generate appealing and impressive visualizations for the insights you have derived. You will also explore different machine learning techniques and statistics that enable you to build powerful predictive models. By the end of this Learning Path, you will have the perfect foundation to take your data mining skills to the next level and set yourself on the path to become a sought-after data science professional. This Learning Path includes content from the following Packt products: • Statistics for Machine Learning by Pratap Dangeti • Matplotlib 2.x By Example by Allen Yu, Claire Chung, Aldrin Yim • Pandas Cookbook by Theodore Petrou

Title Page

Contributors

About Packt

Preface

Free Chapter

Journey from Statistics to Machine Learning

Statistical terminology for model building and validation

Summary

Tree-Based Machine Learning Models

Introducing decision tree classifiers

Comparison between logistic regression and decision trees

Comparison of error components across various styles of models

Remedial actions to push the model towards the ideal region

HR attrition data example

Decision tree classifier

Tuning class weights in decision tree classifier

Bagging classifier

Random forest classifier

Random forest classifier - grid search

AdaBoost classifier

Gradient boosting classifier

Comparison between AdaBoosting versus gradient boosting

Extreme gradient boosting - XGBoost classifier

Ensemble of ensembles - model stacking

Ensemble of ensembles with different types of classifiers

Ensemble of ensembles with bootstrap samples using a single type of classifier

Summary

K-Nearest Neighbors and Naive Bayes

K-nearest neighbors

KNN classifier with breast cancer Wisconsin data example

Tuning of k-value in KNN classifier

Naive Bayes

Probability fundamentals

Understanding Bayes theorem with conditional probability

Naive Bayes classification

Laplace estimator

Naive Bayes SMS spam classification example

Summary

Unsupervised Learning

K-means clustering

Principal Component Analysis - PCA

Singular value decomposition - SVD

Deep auto encoders

Model building technique using encoder-decoder architecture

Deep auto encoders applied on handwritten digits using Keras

Summary

Reinforcement Learning

Reinforcement learning basics

Markov decision processes and Bellman equations

Dynamic programming

Grid world example using value and policy iteration algorithms with basic Python

Monte Carlo methods

Temporal difference learning

SARSA on-policy TD control

Q-learning - off-policy TD control

Cliff walking example of on-policy and off-policy of TD control

Further reading

Summary

Hello Plotting World!

Hello Matplotlib!

Plotting our first graph

Summary

Visualizing Online Data

Typical API data formats

Introducing pandas

Visualizing the trend of data

Introducing Seaborn

Visualizing univariate distribution

Visualizing a bivariate distribution

Visualizing categorical data

Controlling Seaborn figure aesthetics

Summary

Visualizing Multivariate Data

Getting End-of-Day (EOD) stock data from Quandl

Two-dimensional faceted plots

Other two-dimensional multivariate plots

Three-dimensional (3D) plots

Summary

Adding Interactivity and Animating Plots

Scraping information from websites

Non-interactive backends

Interactive backends

Creating animated plots

Summary

Selecting Subsets of Data

Selecting Series data

Selecting DataFrame rows

Selecting DataFrame rows and columns simultaneously

Selecting data with both integers and labels

Speeding up scalar selection

Slicing rows lazily

Slicing lexicographically

Boolean Indexing

Calculating boolean statistics

Constructing multiple boolean conditions

Filtering with boolean indexing

Replicating boolean indexing with index selection

Selecting with unique and sorted indexes

Gaining perspective on stock prices

Translating SQL WHERE clauses

Determining the normality of stock market returns

Improving readability of boolean indexing with the query method

Preserving Series with the where method

Masking DataFrame rows

Selecting with booleans, integer location, and labels

Index Alignment

Examining the Index object

Producing Cartesian products

Exploding indexes

Filling values with unequal indexes

Appending columns from different DataFrames

Highlighting the maximum value from each column

Replicating idxmax with method chaining

Finding the most common maximum

Grouping for Aggregation, Filtration, and Transformation

Defining an aggregation

Grouping and aggregating with multiple columns and functions

Removing the MultiIndex after grouping

Customizing an aggregation function

Customizing aggregating functions with *args and **kwargs

Examining the groupby object

Filtering for states with a minority majority

Transforming through a weight loss bet

Calculating weighted mean SAT scores per state with apply

Grouping by continuous variables

Counting the total number of flights between cities

Finding the longest streak of on-time flights

Restructuring Data into a Tidy Form

Tidying variable values as column names with stack

Tidying variable values as column names with melt

Stacking multiple groups of variables simultaneously

Inverting stacked data

Unstacking after a groupby aggregation

Replicating pivot_table with a groupby aggregation

Renaming axis levels for easy reshaping

Tidying when multiple variables are stored as column names

Tidying when multiple variables are stored as column values

Tidying when two or more values are stored in the same cell

Tidying when variables are stored in column names and values

Tidying when multiple observational units are stored in the same table

Combining Pandas Objects

Appending new rows to DataFrames

Concatenating multiple DataFrames together

Comparing President Trump's and Obama's approval ratings

Understanding the differences between concat, join, and merge

Connecting to SQL databases

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Random forest classifier

Random forests provide an improvement over bagging by doing a small tweak that utilizes de-correlated trees. In bagging, we build a number of decision trees on bootstrapped samples from training data, but the one big drawback with the bagging technique is that it selects all the variables. By doing so, in each decision tree, the order of candidate/variable chosen to split remains more or less the same for all the individual trees, which look correlated with each other. Variance reduction on correlated individual entities does not work effectively while aggregating them.

In random forest, during bootstrapping (repeated sampling with replacement), samples were drawn from training data; not just simply the second and third observations randomly selected, similar to bagging, but it also selects the few predictors/columns out of all predictors (m predictors out of total p predictors).

The thumb rule for variable selection of m variables out of total variables p is m = sqrt...

Numerical Computing with Python

By : Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Numerical Computing with Python

By: Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Overview of this book

Related Content you might be interested in

Current Title:

Numerical Computing with Python

Random forest classifier