Book Image

Python Machine Learning Cookbook - Second Edition

By : Giuseppe Ciaburro, Prateek Joshi
Book Image

Python Machine Learning Cookbook - Second Edition

By: Giuseppe Ciaburro, Prateek Joshi

Overview of this book

This eagerly anticipated second edition of the popular Python Machine Learning Cookbook will enable you to adopt a fresh approach to dealing with real-world machine learning and deep learning tasks. With the help of over 100 recipes, you will learn to build powerful machine learning applications using modern libraries from the Python ecosystem. The book will also guide you on how to implement various machine learning algorithms for classification, clustering, and recommendation engines, using a recipe-based approach. With emphasis on practical solutions, dedicated sections in the book will help you to apply supervised and unsupervised learning techniques to real-world problems. Toward the concluding chapters, you will get to grips with recipes that teach you advanced techniques including reinforcement learning, deep neural networks, and automated machine learning. By the end of this book, you will be equipped with the skills you need to apply machine learning techniques and leverage the full capabilities of the Python ecosystem through real-world examples.
Table of Contents (18 chapters)

Computing regression accuracy

Now that we know how to build a regressor, it's important to understand how to evaluate the quality of a regressor as well. In this context, an error is defined as the difference between the actual value and the value that is predicted by the regressor.

Getting ready

Let's quickly take a look at the metrics that can be used to measure the quality of a regressor. A regressor can be evaluated using many different metrics. There is a module in the scikit-learn library that provides functionalities to compute all the following metrics. This is the sklearn.metrics module, which includes score functions, performance metrics, pairwise metrics, and distance computations.

How to do it...

Let's see how to compute regression accuracy in Python:

  1. Now we will use the functions available to evaluate the performance of the linear regression model we developed in the previous recipe:
import sklearn.metrics as sm
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))

The following results are returned:

Mean absolute error = 241907.27
Mean squared error = 81974851872.13
Median absolute error = 240861.94
Explain variance score = 0.98
R2 score = 0.98

An R2 score near 1 means that the model is able to predict the data very well. Keeping track of every single metric can get tedious, so we pick one or two metrics to evaluate our model. A good practice is to make sure that the mean squared error is low and the explained variance score is high.

How it works...

A regressor can be evaluated using many different metrics, such as the following:

  • Mean absolute error: This is the average of absolute errors of all the data points in the given dataset.
  • Mean squared error: This is the average of the squares of the errors of all the data points in the given dataset. It is one of the most popular metrics out there!
  • Median absolute error: This is the median of all the errors in the given dataset. The main advantage of this metric is that it's robust to outliers. A single bad point in the test dataset wouldn't skew the entire error metric, as opposed to a mean error metric.
  • Explained variance score: This score measures how well our model can account for the variation in our dataset. A score of 1.0 indicates that our model is perfect.
  • R2 score: This is pronounced as R-squared, and this score refers to the coefficient of determination. This tells us how well the unknown samples will be predicted by our model. The best possible score is 1.0, but the score can be negative as well.

There's more...

The sklearn.metrics module contains a series of simple functions that measure prediction error:

  • Functions ending with _score return a value to maximize; the higher the better
  • Functions ending with _error or _loss return a value to minimize; the lower the better

See also