Machine Learning with scikit-learn Quick Start Guide

By : Kevin Jolly

Machine Learning with scikit-learn Quick Start Guide

By: Kevin Jolly

Overview of this book

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize, and evaluate all of the important machine learning algorithms that scikit-learn provides. This book teaches you how to use scikit-learn for machine learning. You will start by setting up and configuring your machine learning environment with scikit-learn. To put scikit-learn to use, you will learn how to implement various supervised and unsupervised machine learning models. You will learn classification, regression, and clustering techniques to work with different types of datasets and train your models. Finally, you will learn about an effective pipeline to help you build a machine learning project from scratch. By the end of this book, you will be confident in building your own machine learning models for accurate predictions.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Introducing Machine Learning with scikit-learn

A brief introduction to machine learning

What is scikit-learn?

Installing scikit-learn

Algorithms that you will learn to implement using scikit-learn

Summary

Predicting Categories with K-Nearest Neighbors

Technical requirements

Preparing a dataset for machine learning with scikit-learn

The k-NN algorithm

Implementing the k-NN algorithm using scikit-learn

Fine-tuning the parameters of the k-NN algorithm

Scaling for optimized performance

Summary

Predicting Categories with Logistic Regression

Technical requirements

Understanding logistic regression mathematically

Implementing logistic regression using scikit-learn

Fine-tuning the hyperparameters

Scaling the data

Interpreting the logistic regression model

Summary

Predicting Categories with Naive Bayes and SVMs

Technical requirements

The Naive Bayes algorithm

Support vector machines

Summary

Predicting Numeric Outcomes with Linear Regression

Technical requirements

The inner mechanics of the linear regression algorithm

Implementing linear regression in scikit-learn

Model optimization

Summary

Classification and Regression with Trees

Technical requirements

Classification trees

Regression trees

Ensemble classifier

Summary

Clustering Data with Unsupervised Machine Learning

Technical requirements

The k-means algorithm

Implementing the k-means algorithm in scikit-learn

Feature engineering for optimization

Cluster visualization

Going from unsupervised to supervised learning

Summary

Performance Evaluation Methods

Technical requirements

Why is performance evaluation critical?

Performance evaluation for classification algorithms

Performance evaluation for regression algorithms

Performance evaluation for unsupervised algorithms

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

A brief introduction to machine learning

Machine learning has generated quite the buzz – from Elon Musk fearing the role of unregulated artificial intelligence in society, to Mark Zuckerberg having a view that contradicts Musk's.

So, what exactly is machine learning? Simply put, machine learning is a set of methods that can detect patterns in data and use those patterns to make future predictions. Machine learning has found immense value in a wide range of industries, ranging from finance to healthcare. This translates to a higher requirement of talent with the skill capital in the field of machine learning.

Broadly speaking, machine learning can be categorized into three main types:

Supervised learning
Unsupervised learning
Reinforcement learning

Scikit-learn is designed to tackle problems pertaining to supervised and unsupervised learning only, and does not support reinforcement learning at present.

Supervised learning

Supervised learning is a form of machine learning in which our data comes with a set of labels or a target variable that is numeric. These labels/categories usually belong to one feature/attribute, which is commonly known as the target variable. For instance, each row of your data could either belong to the category of Healthy or Not Healthy.

Given a set of features such as weight, blood sugar levels, and age, we can use the supervised machine learning algorithm to predict whether the person is healthy or not.

In the following simple mathematical expression, S is the supervised learning algorithm, X is the set of input features, such as weight and age, and Y is the target variable with the labels Healthy or Not Healthy:

Although supervised machine learning is the most common type of machine learning that is implemented with scikit-learn and in the industry, most datasets typically do not come with predefined labels. Unsupervised learning algorithms are first used to cluster data without labels into distinct groups to which we can then assign labels. This is discussed in detail in the following section.

Unsupervised learning

Unsupervised learning is a form of machine learning in which the algorithm tries to detect/find patterns in data that do not have an outcome/target variable. In other words, we do not have data that comes with pre-existing labels. Thus, the algorithm will typically use a metric such as distance to group data together depending on how close they are to each other.

As discussed in the previous section, most of the data that you will encounter in the real world will not come with a set of predefined labels and, as such, will only have a set of input features without a target attribute.

In the following simple mathematical expression, U is the unsupervised learning algorithm, while X is a set of input features, such as weight and age:

Given this data, our objective is to create groups that could potentially be labeled as Healthy or Not Healthy. The unsupervised learning algorithm will use a metric such as distance in order to identify how close a set of points are to each other and how far apart two such groups are. The algorithm will then proceed to cluster these groups into two distinct groups, as illustrated in the following diagram:

Clustering two groups together

Machine Learning with scikit-learn Quick Start Guide

By : Kevin Jolly

Machine Learning with scikit-learn Quick Start Guide

By: Kevin Jolly

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning with scikit-learn Quick Start Guide

scikit-learn Cookbook

Machine Learning for OpenCV

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits