Mastering Machine Learning with scikit-learn

Mastering Machine Learning with scikit-learn - Second Edition

By : Gavin Hackeling

Buy this Book

Mastering Machine Learning with scikit-learn - Second Edition

By: Gavin Hackeling

Buy this Book

Overview of this book

Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model. This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn’s API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model’s performance. By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

The Fundamentals of Machine Learning

Defining machine learning

Learning from experience

Machine learning tasks

Training data, testing data, and validation data

Bias and variance

An introduction to scikit-learn

Installing scikit-learn

Installing pandas, Pillow, NLTK, and matplotlib

Summary

Simple Linear Regression

Simple linear regression

Evaluating the model

Summary

Classification and Regression with k-Nearest Neighbors

K-Nearest Neighbors

Lazy learning and non-parametric models

Classification with KNN

Regression with KNN

Summary

Feature Extraction

Extracting features from categorical variables

Standardizing features

Extracting features from text

Extracting features from images

Summary

From Simple Linear Regression to Multiple Linear Regression

Multiple linear regression

Polynomial regression

Regularization

Applying linear regression

Gradient descent

Summary

From Linear Regression to Logistic Regression

Binary classification with logistic regression

Spam filtering

Tuning models with grid search

Multi-class classification

Multi-label classification and problem transformation

Summary

Naive Bayes

Bayes' theorem

Generative and discriminative models

Naive Bayes

Naive Bayes with scikit-learn

Summary

Nonlinear Classification and Regression with Decision Trees

Decision trees

Training decision trees

Decision trees with scikit-learn

Summary

From Decision Trees to Random Forests and Other Ensemble Methods

Bagging

Boosting

Stacking

Summary

The Perceptron

The perceptron

Limitations of the perceptron

Summary

From the Perceptron to Support Vector Machines

Kernels and the kernel trick

Maximum margin classification and support vectors

Classifying characters in scikit-learn

Summary

From the Perceptron to Artificial Neural Networks

Nonlinear decision boundaries

Feed-forward and feedback ANNs

Multi-layer perceptrons

Training multi-layer perceptrons

Summary

K-means

Clustering

K-means

Evaluating clusters

Image quantization

Clustering to learn features

Summary

Dimensionality Reduction with Principal Component Analysis

Principal component analysis

Visualizing high-dimensional data with PCA

Face recognition with PCA

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Learning from experience

Machine learning systems are often described as learning from experience either with or without supervision from humans. Insupervised learning problems, a program predicts an output for an input by learning from pairs of labeled inputs and outputs. That is, the program learns from examples of the "right answers". In unsupervised learning, a program does not learn from labeled data. Instead, it attempts to discover patterns in data. For example, assume that you have collected data describing the heights and weights of people. An example of an unsupervised learning problem is dividing the data points into groups. A program might produce groups that correspond to men and women, or children and adults. Now assume that the data is also labeled with the person's sex. An example of a supervised learning problem is to induce a rule for predicting whether a person is male or female based on his or her height and weight. We will discuss algorithms and examples of supervised and unsupervised learning in the following chapters.

Supervised learning and unsupervised learning can be thought of as occupying opposite ends of a spectrum. Some types of problem, called semi-supervised learning problems, make use of both supervised and unsupervised data; these problems are located on the spectrum between supervised and unsupervised learning. Reinforcement learning is located near the supervised end of the spectrum. Unlike supervised learning, reinforcement learning programs do not learn from labeled pairs of inputs and outputs. Instead, they receive feedback for their decisions, but errors are not explicitly corrected. For example, a reinforcement learning program that is learning to play a side-scrolling video game like Super Mario Bros may receive a reward when it completes a level or exceeds a certain score, and a punishment when it loses a life. However, this supervised feedback is not associated with specific decisions to run, avoid Goombas, or pick up fire flowers. We will focus primarily on supervised and unsupervised learning, as these categories include most common machine learning problems. In the next sections, we will review supervised and unsupervised learning in more detail.

A supervised learning program learns from labeled examples of the outputs that should be produced for an input. There are many names for the output of a machine learning program. Several disciplines converge in machine learning, and many of those disciplines use their own terminology. In this book, we will refer to the output as the response variable. Other names for response variables include "dependent variables", "regressands", "criterion variables", "measured variables", "responding variables", "explained variables", "outcome variables", "experimental variables", "labels", and "output variables". Similarly, the input variables have several names. In this book, we will refer to inputs as features, and the phenomena they represent as explanatory variables. Other names for explanatory variables include "predictors", "regressors", "controlled variables", and "exposure variables". Response variables and explanatory variables may take real or discrete values.

The collection of examples that comprise supervised experience is called a training set. A collection of examples that is used to assess the performance of a program is called a test set. The response variable can be thought of as the answer to the question posed by the explanatory variables; supervised learning problems learn from a collection of answers to different questions. That is, supervised learning programs are provided with the correct answers and must learn to respond correctly to unseen, but similar, questions.

Mastering Machine Learning with scikit-learn - Second Edition

By : Gavin Hackeling

Mastering Machine Learning with scikit-learn - Second Edition

By: Gavin Hackeling

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Machine Learning with scikit-learn - Second Edition

Python Machine Learning, Second Edition

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Machine Learning with PyTorch and Scikit-Learn

Learning from experience