Book Image

scikit-learn Cookbook - Second Edition

By : Trent Hauck
Book Image

scikit-learn Cookbook - Second Edition

By: Trent Hauck

Overview of this book

Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. This book includes walk throughs and solutions to the common as well as the not-so-common problems in machine learning, and how scikit-learn can be leveraged to perform various machine learning tasks effectively. The second edition begins with taking you through recipes on evaluating the statistical properties of data and generates synthetic data for machine learning modelling. As you progress through the chapters, you will comes across recipes that will teach you to implement techniques like data pre-processing, linear regression, logistic regression, K-NN, Naïve Bayes, classification, decision trees, Ensembles and much more. Furthermore, you’ll learn to optimize your models with multi-class classification, cross validation, model evaluation and dive deeper in to implementing deep learning with scikit-learn. Along with covering the enhanced features on model section, API and new features like classifiers, regressors and estimators the book also contains recipes on evaluating and fine-tuning the performance of your model. By the end of this book, you will have explored plethora of features offered by scikit-learn for Python to solve any machine learning problem you come across.
Table of Contents (13 chapters)

Plotting with NumPy and matplotlib

A simple way to make visualizations with NumPy is by using the library matplotlib. Let's make some visualizations quickly.

Getting ready

Start by importing numpy and matplotlib. You can view visualizations within an IPython Notebook using the %matplotlib inline command:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

How to do it...

  1. The main command in matplotlib, in pseudo code, is as follows:
plt.plot(numpy array, numpy array of same length)
  1. Plot a straight line by placing two NumPy arrays of the same length:
plt.plot(np.arange(10), np.arange(10))
  1. Plot an exponential:
plt.plot(np.arange(10), np.exp(np.arange(10)))
  1. Place the two graphs side by side:
plt.figure()
plt.subplot(121)
plt.plot(np.arange(10), np.exp(np.arange(10)))
plt.subplot(122)
plt.scatter(np.arange(10), np.exp(np.arange(10)))

Or top to bottom:

plt.figure()
plt.subplot(211)
plt.plot(np.arange(10), np.exp(np.arange(10)))
plt.subplot(212)
plt.scatter(np.arange(10), np.exp(np.arange(10)))

The first two numbers in the subplot command refer to the grid size in the figure instantiated by plt.figure(). The grid size referred to in plt.subplot(221) is 2 x 2, the first two digits. The last digit refers to traversing the grid in reading order: left to right and then up to down.

  1. Plot in a 2 x 2 grid traversing in reading order from one to four:
plt.figure()
plt.subplot(221)
plt.plot(np.arange(10), np.exp(np.arange(10)))
plt.subplot(222)
plt.scatter(np.arange(10), np.exp(np.arange(10)))
plt.subplot(223)
plt.scatter(np.arange(10), np.exp(np.arange(10)))
plt.subplot(224)
plt.scatter(np.arange(10), np.exp(np.arange(10)))
  1. Finally, with real data:
from sklearn.datasets import load_iris

iris = load_iris()
data = iris.data
target = iris.target

# Resize the figure for better viewing
plt.figure(figsize=(12,5))

# First subplot
plt.subplot(121)

# Visualize the first two columns of data:
plt.scatter(data[:,0], data[:,1], c=target)

# Second subplot
plt.subplot(122)

# Visualize the last two columns of data:
plt.scatter(data[:,2], data[:,3], c=target)

The c parameter takes an array of colors—in this case, the colors 0, 1, and 2 in the iris target: