Python Machine Learning Cookbook - Second Edition

By : Giuseppe Ciaburro, Prateek Joshi

Python Machine Learning Cookbook - Second Edition

By: Giuseppe Ciaburro, Prateek Joshi

Overview of this book

This eagerly anticipated second edition of the popular Python Machine Learning Cookbook will enable you to adopt a fresh approach to dealing with real-world machine learning and deep learning tasks. With the help of over 100 recipes, you will learn to build powerful machine learning applications using modern libraries from the Python ecosystem. The book will also guide you on how to implement various machine learning algorithms for classification, clustering, and recommendation engines, using a recipe-based approach. With emphasis on practical solutions, dedicated sections in the book will help you to apply supervised and unsupervised learning techniques to real-world problems. Toward the concluding chapters, you will get to grips with recipes that teach you advanced techniques including reinforcement learning, deep neural networks, and automated machine learning. By the end of this book, you will be equipped with the skills you need to apply machine learning techniques and leverage the full capabilities of the Python ecosystem through real-world examples.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

The Realm of Supervised Learning

Technical requirements

Introduction

Array creation in Python

Data preprocessing using mean removal

Building a linear regressor

Computing regression accuracy

Achieving model persistence

Building a ridge regressor

Building a polynomial regressor

Estimating housing prices

Computing the relative importance of features

Estimating bicycle demand distribution

Constructing a Classifier

Technical requirements

Introduction

Building a simple classifier

Building a logistic regression classifier

Building a Naive Bayes classifier

Splitting a dataset for training and testing

Evaluating accuracy using cross-validation metrics

Visualizing a confusion matrix

Extracting a performance report

Evaluating cars based on their characteristics

Extracting validation curves

Extracting learning curves

Estimating the income bracket

Predicting the quality of wine

Predictive Modeling

Technical requirements

Introduction

Building a linear classifier using SVMs

Building a nonlinear classifier using SVMs

Tackling class imbalance

Extracting confidence measurements

Finding optimal hyperparameters

Building an event predictor

Estimating traffic

Simplifying machine learning workflow using TensorFlow

Implementing a stacking method

Clustering with Unsupervised Learning

Technical requirements

Introduction

Clustering data using the k-means algorithm

Compressing an image using vector quantization

Grouping data using agglomerative clustering

Evaluating the performance of clustering algorithms

Estimating the number of clusters using the DBSCAN algorithm

Finding patterns in stock market data

Building a customer segmentation model

Using autoencoders to reconstruct handwritten digit images

Visualizing Data

Technical requirements

An introduction to data visualization

Plotting three-dimensional scatter plots

Plotting bubble plots

Animating bubble plots

Drawing pie charts

Plotting date-formatted time series data

Plotting histograms

Visualizing heat maps

Animating dynamic signals

Working with the Seaborn library

Building Recommendation Engines

Technical requirements

Introducing the recommendation engine

Building function compositions for data processing

Building machine learning pipelines

Finding the nearest neighbors

Constructing a k-nearest neighbors classifier

Constructing a k-nearest neighbors regressor

Computing the Euclidean distance score

Computing the Pearson correlation score

Finding similar users in the dataset

Generating movie recommendations

Implementing ranking algorithms

Building a filtering model using TensorFlow

Analyzing Text Data

Technical requirements

Introduction

Preprocessing data using tokenization

Stemming text data

Converting text to its base form using lemmatization

Dividing text using chunking

Building a bag-of-words model

Building a text classifier

Identifying the gender of a name

Analyzing the sentiment of a sentence

Identifying patterns in text using topic modeling

Parts of speech tagging with spaCy

Word2Vec using gensim

Shallow learning for spam detection

Speech Recognition

Technical requirements

Introducing speech recognition

Reading and plotting audio data

Transforming audio signals into the frequency domain

Generating audio signals with custom parameters

Synthesizing music

Extracting frequency domain features

Building HMMs

Building a speech recognizer

Building a TTS system

Dissecting Time Series and Sequential Data

Technical requirements

Introducing time series

Transforming data into a time series format

Slicing time series data

Operating on time series data

Extracting statistics from time series data

Building HMMs for sequential data

Building CRFs for sequential text data

Analyzing stock market data

Using RNNs to predict time series data

Analyzing Image Content

Technical requirements

Introducing computer vision

Operating on images using OpenCV-Python

Detecting edges

Histogram equalization

Detecting corners

Detecting SIFT feature points

Building a Star feature detector

Creating features using Visual Codebook and vector quantization

Training an image classifier using Extremely Random Forests

Building an object recognizer

Using Light GBM for image classification

Biometric Face Recognition

Technical requirements

Introduction

Capturing and processing video from a webcam

Building a face detector using Haar cascades

Building eye and nose detectors

Performing principal component analysis

Performing kernel principal component analysis

Performing blind source separation

Building a face recognizer using a local binary patterns histogram

Recognizing faces using the HOG-based model

Facial landmark recognition

User authentication by face recognition

Reinforcement Learning Techniques

Technical requirements

Introduction

Weather forecasting with MDP

Optimizing a financial portfolio using DP

Finding the shortest path

Deciding the discount factor using Q-learning

Implementing the deep Q-learning algorithm

Developing an AI-based dynamic modeling system

Deep reinforcement learning with double Q-learning

Deep Q-network algorithm with dueling Q-learning

Deep Neural Networks

Technical requirements

Introduction

Building a perceptron

Building a single layer neural network

Building a deep neural network

Creating a vector quantizer

Building a recurrent neural network for sequential data analysis

Visualizing the characters in an OCR database

Building an optical character recognizer using neural networks

Implementing optimization algorithms in ANN

Unsupervised Representation Learning

Technical requirements

Introduction

Using denoising autoencoders to detect fraudulent transactions

Generating word embeddings using CBOW and skipgram representations

Visualizing the MNIST dataset using PCA and t-SNE

Using word embedding for Twitter sentiment analysis

Implementing LDA with scikit-learn

Using LDA to classify text documents

Preparing data for LDA

Automated Machine Learning and Transfer Learning

Technical requirements

Introduction

Working with Auto-WEKA

Using AutoML to generate machine learning pipelines with TPOT

Working with Auto-Keras

Working with auto-sklearn

Using MLBox for selection and leak detection

Convolutional neural networks with transfer learning

Transfer learning with pretrained image classifiers using ResNet-50

Transfer learning using feature extraction with the VGG16 model

Transfer learning with pretrained GloVe embedding

Unlocking Production Issues

Technical requirements

Introduction

Handling unstructured data

Deploying machine learning models

Keeping track of changes into production

Tracking accuracy to optimize model scaling

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Building a linear regressor

Linear regression refers to finding the underlying function with the help of linear combination of input variables. The previous example had an input variable and an output variable. A simple linear regression is easy to understand, but represents the basis of regression techniques. Once these concepts are understood, it will be easier for us to address the other types of regression.

Consider the following diagram:

The linear regression method consists of precisely identifying a line that is capable of representing point distribution in a two-dimensional plane, that is, if the points corresponding to the observations are near the line, then the chosen model will be able to describe the link between the variables effectively.

In theory, there are an infinite number of lines that may approximate the observations, while in practice, there is only one mathematical model that optimizes the representation of the data. In the case of a linear mathematical relationship, the observations of the y variable can be obtained by a linear function of the observations of the x variable. For each observation, we will use the following formula:

In the preceding formula, x is the explanatory variable and y is the response variable. The α and β parameters, which represent the slope of the line and the intercept with the y-axis respectively, must be estimated based on the observations collected for the two variables included in the model.

The slope, α, is of particular interest, that is, the variation of the mean response for every single increment of the explanatory variable. What about a change in this coefficient? If the slope is positive, the regression line increases from left to right, and if the slope is negative, the line decreases from left to right. When the slope is zero, the explanatory variable has no effect on the value of the response. But it is not just the sign of α that establishes the weight of the relationship between the variables. More generally, its value is also important. In the case of a positive slope, the mean response is higher when the explanatory variable is higher, while in the case of a negative slope, the mean response is lower when the explanatory variable is higher.

The main aim of linear regression is to get the underlying linear model that connects the input variable to the output variable. This in turn reduces the sum of squares of differences between the actual output and the predicted output using a linear function. This method is called ordinary least squares. In this method, the coefficients are estimated by determining numerical values that minimize the sum of the squared deviations between the observed responses and the fitted responses, according to the following equation:

This quantity represents the sum of the squares of the distances to each experimental datum (x_i, y_i) from the corresponding point on the straight line.

You might say that there might be a curvy line out there that fits these points better, but linear regression doesn't allow this. The main advantage of linear regression is that it's not complex. If you go into non-linear regression, you may get more accurate models, but they will be slower. As shown in the preceding diagram, the model tries to approximate the input data points using a straight line. Let's see how to build a linear regression model in Python.

Getting ready

Regression is used to find out the relationship between input data and the continuously-valued output data. This is generally represented as real numbers, and our aim is to estimate the core function that calculates the mapping from the input to the output. Let's start with a very simple example. Consider the following mapping between input and output:

1 --> 2
3 --> 6
4.3 --> 8.6
7.1 --> 14.2

If I ask you to estimate the relationship between the inputs and the outputs, you can easily do this by analyzing the pattern. We can see that the output is twice the input value in each case, so the transformation would be as follows:

This is a simple function, relating the input values with the output values. However, in the real world, this is usually not the case. Functions in the real world are not so straightforward!

You have been provided with a data file called VehiclesItaly.txt. This contains comma-separated lines, where the first element is the input value and the second element is the output value that corresponds to this input value. Our goal is to find the linear regression relation between the vehicle registrations in a state and the population of a state. You should use this as the input argument. As anticipated, the Registrations variable contains the number of vehicles registered in Italy and the Population variable contains the population of the different regions.

How to do it...

Let's see how to build a linear regressor in Python:

Create a file called regressor.py and add the following lines:

filename = "VehiclesItaly.txt"
X = []
y = []
with open(filename, 'r') as f:
    for line in f.readlines():
        xt, yt = [float(i) for i in line.split(',')]
        X.append(xt)
        y.append(yt)

We just loaded the input data into X and y, where X refers to the independent variable (explanatory variables) and y refers to the dependent variable (response variable). Inside the loop in the preceding code, we parse each line and split it based on the comma operator. We then convert them into floating point values and save them in X and y.

When we build a machine learning model, we need a way to validate our model and check whether it is performing at a satisfactory level. To do this, we need to separate our data into two groups—a training dataset and a testing dataset. The training dataset will be used to build the model, and the testing dataset will be used to see how this trained model performs on unknown data. So, let's go ahead and split this data into training and testing datasets:

num_training = int(0.8 * len(X))
num_test = len(X) - num_training

import numpy as np

# Training data
X_train = np.array(X[:num_training]).reshape((num_training,1))
y_train = np.array(y[:num_training])

# Test data
X_test = np.array(X[num_training:]).reshape((num_test,1))
y_test = np.array(y[num_training:])

First, we have put aside 80% of the data for the training dataset and the remaining 20% is for the testing dataset. Then, we have built four arrays: X_train, X_test,y_train, and y_test.

We are now ready to train the model. Let's create a regressor object, as follows:

from sklearn import linear_model

# Create linear regression object
linear_regressor = linear_model.LinearRegression()

# Train the model using the training sets
linear_regressor.fit(X_train, y_train)

First, we have imported linear_model methods from the sklearn library, which are methods used for regression, wherein the target value is expected to be a linear combination of the input variables. Then, we have used the LinearRegression() function, which performs ordinary least squares linear regression. Finally, the fit() function is used to fit the linear model. Two parameters are passed—training data (X_train), and target values (y_train).

We just trained the linear regressor, based on our training data. The fit() method takes the input data and trains the model. To see how it all fits, we have to predict the training data with the model fitted:

y_train_pred = linear_regressor.predict(X_train)

To plot the outputs, we will use the matplotlib library as follows:

import matplotlib.pyplot as plt
plt.figure()
plt.scatter(X_train, y_train, color='green')
plt.plot(X_train, y_train_pred, color='black', linewidth=4)
plt.title('Training data')
plt.show()

When you run this in the Terminal, the following diagram is shown:

In the preceding code, we used the trained model to predict the output for our training data. This wouldn't tell us how the model performs on unknown data, because we are running it on the training data. This just gives us an idea of how the model fits on training data. Looks like it's doing okay, as you can see in the preceding diagram!
Let's predict the test dataset output based on this model and plot it, as follows:

y_test_pred = linear_regressor.predict(X_test)
plt.figure()
plt.scatter(X_test, y_test, color='green')
plt.plot(X_test, y_test_pred, color='black', linewidth=4)
plt.title('Test data')
plt.show()

When you run this in the Terminal, the following output is returned:

As you might expect, there's a positive association between a state's population and the number of vehicle registrations.

How it works...

In this recipe, we looked for the linear regression relation between the vehicle registrations in a state and the population of a state. To do this we used the LinearRegression() function of the linear_model method of the sklearn library. After constructing the model, we first used the data involved in training the model to visually verify how well the model fits the data. Then, we used the test data to verify the results.

There's more...

The best way to appreciate the results of a simulation is to display those using special charts. In fact, we have already used this technique in this section. I am referring to the chart in which we drew the scatter plot of the distribution with the regression line. In Chapter 5, Visualizing Data, we will see other plots that will allow us to check the model's hypotheses.

Python Machine Learning Cookbook - Second Edition

By : Giuseppe Ciaburro, Prateek Joshi

Python Machine Learning Cookbook - Second Edition

By: Giuseppe Ciaburro, Prateek Joshi

Overview of this book

Related Content you might be interested in

Current Title:

Python Machine Learning Cookbook - Second Edition

Artificial Intelligence with Python

Artificial Intelligence with Python

Keras Reinforcement Learning Projects