Book Image

Machine Learning with scikit-learn Quick Start Guide

By : Kevin Jolly

Book Image

Machine Learning with scikit-learn Quick Start Guide

By: Kevin Jolly

Overview of this book

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize, and evaluate all of the important machine learning algorithms that scikit-learn provides. This book teaches you how to use scikit-learn for machine learning. You will start by setting up and configuring your machine learning environment with scikit-learn. To put scikit-learn to use, you will learn how to implement various supervised and unsupervised machine learning models. You will learn classification, regression, and clustering techniques to work with different types of datasets and train your models. Finally, you will learn about an effective pipeline to help you build a machine learning project from scratch. By the end of this book, you will be confident in building your own machine learning models for accurate predictions.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

Introducing Machine Learning with scikit-learn

Introducing Machine Learning with scikit-learn

A brief introduction to machine learning

What is scikit-learn?

Installing scikit-learn

Algorithms that you will learn to implement using scikit-learn

Predicting Categories with K-Nearest Neighbors

Predicting Categories with K-Nearest Neighbors

Technical requirements

Preparing a dataset for machine learning with scikit-learn

The k-NN algorithm

Implementing the k-NN algorithm using scikit-learn

Fine-tuning the parameters of the k-NN algorithm

Scaling for optimized performance

Predicting Categories with Logistic Regression

Predicting Categories with Logistic Regression

Technical requirements

Understanding logistic regression mathematically

Implementing logistic regression using scikit-learn

Fine-tuning the hyperparameters

Scaling the data

Interpreting the logistic regression model

Predicting Categories with Naive Bayes and SVMs

Predicting Categories with Naive Bayes and SVMs

Technical requirements

The Naive Bayes algorithm

Support vector machines

Predicting Numeric Outcomes with Linear Regression

Predicting Numeric Outcomes with Linear Regression

Technical requirements

The inner mechanics of the linear regression algorithm

Implementing linear regression in scikit-learn

Model optimization

Classification and Regression with Trees

Classification and Regression with Trees

Technical requirements

Classification trees

Regression trees

Ensemble classifier

Clustering Data with Unsupervised Machine Learning

Clustering Data with Unsupervised Machine Learning

Technical requirements

The k-means algorithm

Implementing the k-means algorithm in scikit-learn

Feature engineering for optimization

Cluster visualization

Going from unsupervised to supervised learning

Performance Evaluation Methods

Performance Evaluation Methods

Technical requirements

Why is performance evaluation critical?

Performance evaluation for classification algorithms

Performance evaluation for regression algorithms

Performance evaluation for unsupervised algorithms

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

The Naive Bayes algorithm

The Naive Bayes algorithm makes use of the Bayes theorem, in order to classify classes and categories. The word naive was given to the algorithm because the algorithm assumes that all attributes are independent of one another. This is not actually possible, as every attribute/feature in a dataset is related to another attribute, in one way or another.

Despite being naive, the algorithm does well in actual practice. The formula for the Bayes theorem is as follows:

Bayes theorem formula

We can split the preceding algorithm into the following components:

p(h|D): This is the probability of a hypothesis taking place, provided that we have a dataset. An example of this would be the probability of a fraudulent transaction taking place, provided that we had a dataset that consisted of fraudulent and non-fraudulent transactions.
p(D|h): This is the probability...