Book Image

Hands-On Predictive Analytics with Python

By : Alvaro Fuentes

Book Image

Hands-On Predictive Analytics with Python

By: Alvaro Fuentes

Overview of this book

Predictive analytics is an applied field that employs a variety of quantitative methods using data to make predictions. It involves much more than just throwing data onto a computer to build a model. This book provides practical coverage to help you understand the most important concepts of predictive analytics. Using practical, step-by-step examples, we build predictive analytics solutions while using cutting-edge Python tools and packages. The book's step-by-step approach starts by defining the problem and moves on to identifying relevant data. We will also be performing data preparation, exploring and visualizing relationships, building models, tuning, evaluating, and deploying model. Each stage has relevant practical examples and efficient Python code. You will work with models such as KNN, Random Forests, and neural networks using the most important libraries in Python's data science stack: NumPy, Pandas, Matplotlib, Seaborn, Keras, Dash, and so on. In addition to hands-on code examples, you will find intuitive explanations of the inner workings of the main techniques and algorithms used in predictive analytics. By the end of this book, you will be all set to build high-performance predictive analytics solutions using Python programming.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

The Predictive Analytics Process

The Predictive Analytics Process

Technical requirements

What is predictive analytics?

Reviewing important concepts of predictive analytics

The predictive analytics process

A quick tour of Python's data science stack

Further reading

Problem Understanding and Data Preparation

Problem Understanding and Data Preparation

Technical requirements

Understanding the business problem and proposing a solution

Practical project – diamond prices

Practical project – credit card default

Further reading

Dataset Understanding – Exploratory Data Analysis

Dataset Understanding – Exploratory Data Analysis

Technical requirements

Introduction to graphical multivariate EDA

Further reading

Predicting Numerical Values with Machine Learning

Predicting Numerical Values with Machine Learning

Technical requirements

Introduction to ML

Practical considerations before modeling

Lasso regression

Training versus testing error

Further reading

Predicting Categories with Machine Learning

Predicting Categories with Machine Learning

Technical requirements

Classification tasks

Credit card default dataset

Logistic regression

Classification trees

Training versus testing error

Multiclass classification

Naive Bayes classifiers

Further reading

Introducing Neural Nets for Predictive Analytics

Introducing Neural Nets for Predictive Analytics

Technical requirements

Introducing neural network models

Introducing TensorFlow and Keras

Regressing with neural networks

Classification with neural networks

The dark art of training neural networks

Further reading

Model Evaluation

Model Evaluation

Technical requirements

Evaluation of regression models

Evaluation for classification models

The k-fold cross-validation

Further reading

Model Tuning and Improving Performance

Model Tuning and Improving Performance

Technical requirements

Hyperparameter tuning

Improving performance

Implementing a Model with Dash

Implementing a Model with Dash

Technical requirements

Model communication and/or deployment phase

Introducing Dash

Implementing a predictive model as a web application

Further reading

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Training versus testing error

The point of splitting the dataset into training and testing sets was to simulate the situation of using the model to make predictions on data the model has not seen. As we said before, the whole point is to generalize what we have learned from the observed data. The training MSE (or any metric calculated on the training dataset) may give us a biased view of the performance of our model, especially because of the possibility of overfitting. The metrics of performance we get from the training dataset will tend to be too optimistic. Let's take a look again at our illustration of overfitting:

If we calculate the training MSE for these three cases, we will definitely get the lowest one (hence the best) for the third model, the polynomial with 16 degrees; as we see, the model touches many points, making the error for those points exactly 0. However...