Mastering Machine Learning with scikit-learn

Mastering Machine Learning with scikit-learn

By : Gavin Hackeling

Buy this Book

Mastering Machine Learning with scikit-learn

By: Gavin Hackeling

Buy this Book

Overview of this book

This book examines machine learning models including logistic regression, decision trees, and support vector machines, and applies them to common problems such as categorizing documents and classifying images. It begins with the fundamentals of machine learning, introducing you to the supervised-unsupervised spectrum, the uses of training and test data, and evaluating models. You will learn how to use generalized linear models in regression problems, as well as solve problems with text and categorical features. You will be acquainted with the use of logistic regression, regularization, and the various loss functions that are used by generalized linear models. The book will also walk you through an example project that prompts you to label the most uncertain training examples. You will also use an unsupervised Hidden Markov Model to predict stock prices. By the end of the book, you will be an expert in scikit-learn and will be well versed in machine learning.

Mastering Machine Learning with scikit-learn

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

The Fundamentals of Machine Learning

Learning from experience

Machine learning tasks

Training data and test data

Performance measures, bias, and variance

An introduction to scikit-learn

Installing scikit-learn

Installing pandas and matplotlib

Summary

Linear Regression

Simple linear regression

Evaluating the model

Multiple linear regression

Polynomial regression

Regularization

Applying linear regression

Fitting models with gradient descent

Summary

Feature Extraction and Preprocessing

Extracting features from categorical variables

Extracting features from text

Extracting features from images

Data standardization

Summary

From Linear Regression to Logistic Regression

Binary classification with logistic regression

Spam filtering

Binary classification performance metrics

Calculating the F1 measure

ROC AUC

Tuning models with grid search

Multi-class classification

Multi-label classification and problem transformation

Summary

Nonlinear Classification and Regression with Decision Trees

Decision trees

Training decision trees

Decision trees with scikit-learn

Summary

Clustering with K-Means

Clustering with the K-Means algorithm

Evaluating clusters

Image quantization

Clustering to learn features

Summary

Dimensionality Reduction with PCA

An overview of PCA

Performing Principal Component Analysis

Using PCA to visualize high-dimensional data

Face recognition with PCA

Summary

The Perceptron

Activation functions

Binary classification with the perceptron

Limitations of the perceptron

Summary

From the Perceptron to Support Vector Machines

Kernels and the kernel trick

Maximum margin classification and support vectors

Classifying characters in scikit-learn

Summary

From the Perceptron to Artificial Neural Networks

Nonlinear decision boundaries

Feedforward and feedback artificial neural networks

Approximating XOR with Multilayer perceptrons

Classifying handwritten digits

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Spam filtering

Our first problem is a modern version of the canonical binary classification problem: spam classification. In our version, however, we will classify spam and ham SMS messages rather than e-mail. We will extract TF-IDF features from the messages using techniques you learned in Chapter 3, Feature Extraction and Preprocessing, and classify the messages using logistic regression.

We will use the SMS Spam Classification Data Set from the UCI Machine Learning Repository. The dataset can be downloaded from http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. First, let's explore the data set and calculate some basic summary statistics using pandas:

>>> import pandas as pd
>>> df = pd.read_csv('data/SMSSpamCollection', delimiter='\t', header=None)
>>> print df.head()

      0                                                  1
0   ham  Go until jurong point, crazy.. Available only ...
1   ham                      Ok lar... Joking wif u oni...
2  spam...

Mastering Machine Learning with scikit-learn

By : Gavin Hackeling

Mastering Machine Learning with scikit-learn

By: Gavin Hackeling

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Machine Learning with scikit-learn

Spam filtering