Book Image

Building Machine Learning Systems with Python - Third Edition

By : Luis Pedro Coelho, Willi Richert, Matthieu Brucher

Book Image

Building Machine Learning Systems with Python - Third Edition

By: Luis Pedro Coelho, Willi Richert, Matthieu Brucher

Overview of this book

Machine learning enables systems to make predictions based on historical data. Python is one of the most popular languages used to develop machine learning applications, thanks to its extensive library support. This updated third edition of Building Machine Learning Systems with Python helps you get up to speed with the latest trends in artificial intelligence (AI). With this guide’s hands-on approach, you’ll learn to build state-of-the-art machine learning models from scratch. Complete with ready-to-implement code and real-world examples, the book starts by introducing the Python ecosystem for machine learning. You’ll then learn best practices for preparing data for analysis and later gain insights into implementing supervised and unsupervised machine learning techniques such as classification, regression and clustering. As you progress, you’ll understand how to use Python’s scikit-learn and TensorFlow libraries to build production-ready and end-to-end machine learning system models, and then fine-tune them for high performance. By the end of this book, you’ll have the skills you need to confidently train and deploy enterprise-grade machine learning models in Python.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

Getting Started with Python Machine Learning

Getting Started with Python Machine Learning

Machine learning and Python – a dream team

Classifying with Real-World Examples

Classifying with Real-World Examples

The Iris dataset

Evaluation – holding out data and cross-validation

How to measure and compare classifiers

A more complex dataset and the nearest-neighbor classifier

Which classifier to use

Regression

Predicting house prices with regression

Multidimensional regression

Cross-validation for regression

Using Lasso or ElasticNet in scikit-learn

Regression with TensorFlow

Classification I – Detecting Poor Answers

Classification I – Detecting Poor Answers

Sketching our roadmap

Learning to classify classy answers

Fetching the data

Creating our first classifier

Deciding how to improve the performance

Using logistic regression

Looking behind accuracy – precision and recall

Slimming the classifier

Classification using Tensorflow

Dimensionality Reduction

Dimensionality Reduction

Sketching our roadmap

Selecting features

Feature projection

Multidimensional scaling

Autoencoders, or neural networks for dimensionality reduction

Clustering – Finding Related Posts

Clustering – Finding Related Posts

Measuring the relatedness of posts

Preprocessing – similarity measured as a similar number of common words

Solving our initial challenge

Tweaking the parameters

Recommendations

Recommendations

Rating predictions and recommendations

Splitting into training and testing

Normalizing the training data

A neighborhood approach to recommendations

A regression approach to recommendations

Combining multiple methods

Basket analysis

Association rule mining

Artificial Neural Networks and Deep Learning

Artificial Neural Networks and Deep Learning

Using TensorFlow

Saving and restoring neural networks

LSTM for predicting text

LSTM for image processing

Classification II – Sentiment Analysis

Classification II – Sentiment Analysis

Sketching our roadmap

Fetching the Twitter data

Introducing the Naïve Bayes classifier

Creating our first classifier and tuning it

Cleaning tweets

Taking the word types into account

Topic Modeling

Latent Dirichlet allocation

Classification III – Music Genre Classification

Classification III – Music Genre Classification

Sketching our roadmap

Fetching the music data

Looking at music

Using FFT to build our first classifier

Improving classification performance with mel frequency cepstral coefficients

Music classification using Tensorflow

Computer Vision

Computer Vision

Introducing image processing

Basic image classification

Computing features from images

Writing your own features

Using features to find similar images

Classifying a harder dataset

Local feature representations

Image generation with adversarial networks

Reinforcement Learning

Reinforcement Learning

Types of reinforcement learning

Excelling at games

Bigger Data

Learning about big data

Looking under the hood

Using jug for data analysis

Reusing partial results

Using Amazon Web Services

Creating your first virtual machines

Installing Python packages on Amazon Linux

Running jug on our cloud machine

Automating the generation of clusters with cfncluster

Where to Learn More About Machine Learning

Where to Learn More About Machine Learning

All that was left out

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Slimming the classifier

It is always worth looking at the actual contributions of the individual features. For logistic regression, we can directly take the learned coefficients (clf.coef_) to get an impression of the features' impact:

We see that NumCodeLines, LinkCount, AvgWordLen, and NumTextTokens have the highest positive impact on determining whether a post is a good one, while AvgWordLen, LinkCount, and NumCodeLines have a say in that as well, but much less so. This means that being more verbose will more likely result in a classification as a good answer.

On the other side, we have NumAllCaps and NumExclams have negative weights one. That means that the more an answer is shouting, the less likely it will be received well.

Then we have the AvgSentLen feature, which does not seem to help much in detecting a good answer. We could easily drop that feature and retain...