Designing Machine Learning Systems with Python

By : David Julian

Designing Machine Learning Systems with Python

By: David Julian

Overview of this book

Machine learning is one of the fastest growing trends in modern computing. It has applications in a wide range of fields, including economics, the natural sciences, web development, and business modeling. In order to harness the power of these systems, it is essential that the practitioner develops a solid understanding of the underlying design principles. There are many reasons why machine learning models may not give accurate results. By looking at these systems from a design perspective, we gain a deeper understanding of the underlying algorithms and the optimisational methods that are available. This book will give you a solid foundation in the machine learning design process, and enable you to build customised machine learning models to solve unique problems. You may already know about, or have worked with, some of the off-the-shelf machine learning models for solving common problems such as spam detection or movie classification, but to begin solving more complex problems, it is important to adapt these models to your own specific needs. This book will give you this understanding and more.

Designing Machine Learning Systems with Python

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Thinking in Machine Learning

The human interface

Design principles

Summary

Tools and Techniques

Python for machine learning

IPython console

Installing the SciPy stack

NumPY

Matplotlib

Pandas

SciPy

Scikit-learn

Summary

Turning Data into Information

Signals

Summary

Models – Learning from Information

Logical models

Tree models

Rule models

Summary

Linear Models

Introducing least squares

Logistic regression

Multiclass classification

Regularization

Summary

Neural Networks

Getting started with neural networks

Logistic units

Cost function

Implementing a neural network

Gradient checking

Other neural net architectures

Summary

Features – How Algorithms See the World

Feature types

Operations and statistics

Structured features

Transforming features

Principle component analysis

Summary

Learning with Ensembles

Ensemble types

Bagging

Boosting

Ensemble strategies

Summary

Design Strategies and Case Studies

Evaluating model performance

Model selection

Learning curves

Real-world case studies

Machine learning at a glance

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Index

A

abstraction / Structured features
adaptive boosting (AdaBoost) / Adaboost
agglomerative / Discretization
Anaconda
- reference link / Installing the SciPy stack
anomaly detection / Dimensionality reduction
approaches, data volume issue
- efficiency / Data volume
- scalability / Data volume
- parallelism / Data volume
approaches, machine learning problem
- exploratory / Types of questions
- descriptive / Types of questions
- inferential / Types of questions
- predictive / Types of questions
- casual / Types of questions
- mechanistic / Types of questions
approaches, Sklearn
- estimator score / Evaluating model performance
- scoring parameters / Evaluating model performance
- metric functions / Evaluating model performance
association rule learning / Set-based rule models
association rule mining / Set-based rule models
audioSamp.wav file
- download link / Data from sound
averaging method / Ensemble types

B

bagging
- about / Bagging
- random forests / Random forests
- extra trees / Extra trees
batch gradient descent / Gradient descent
Bayes classifier / Operations and statistics
beam search / Set-based rule models
bias feature / Gradient descent
Big Data
- about / Big data
- challenges / Challenges of big data
- data models / Data models
- data distributions / Data distributions
- data, from distributions / Data distributions
- data, from databases / Data from databases
- data, obtaining from Web / Data from the Web
- data, obtaining from natural language / Data from natural language
- data, obtaining from images / Data from images
- data, obtaining from application programming interfaces / Data from application programming interfaces
bin / Discretization
binarization / Transforming features
binary splits / Features
Boolean feature / Categorical features
boosting
- about / Boosting
- Adaboost (adaptive boosting ) / Adaboost
- gradient boosting / Gradient boosting
boosting method / Ensemble types
broadcasting / Mathematical operations
broad settings, tasks
- supervised learning / Tasks
- unsupervised learning / Tasks
- reinforcement learning / Tasks
bucketing / Other methods

C

calibration / Calibration
Canopy
- reference link / Installing the SciPy stack
categorical features / Categorical features
central moment / Operations and statistics
challenges, Big Data
- about / Challenges of big data
- data volume / Data volume
- data velocity / Data velocity
- data variety / Data variety
classification error / Purity
closed form solution / Regularization
collaborative filtering approach / Building a recommender system, Collaborative filtering
computational complexity / PAC learning and computational complexity
conjunctively separable / Version space
content based filtering approach / Content-based filtering
corpora / Data from natural language
cost function
- about / Cost function
- minimizing / Minimizing the cost function
cost function, logistic regression / The Cost function for logistic regression
coverage space / Coverage space
Cumulative Density Function (CDF) / Data distributions

D

data
- reference link / Pandas
- about / What is data?
- models / Data models
- distributions / Data distributions
- obtaining, from databases / Data from databases
- obtaining, from Web / Data from the Web
- obtaining, from natural language / Data from natural language
- obtaining, from images / Data from images
- obtaining, from application programming interfaces / Data from application programming interfaces
- cleaning / Cleaning data
- visualizing / Visualizing data
databases
- used, for solving issues / Data models
data models
- about / Data models
data models, components
- structure / Data models
- constraints / Data models
- operations / Data models
decision boundary / Logistic regression
deep architecture / Other neural net architectures
descriptive models / Set-based rule models
design principles
- about / Design principles
- question, types / Types of questions
- right question, asking / Are you asking the right question?
- tasks / Tasks
- unified modelling language (UML) / Unified modeling language
discretization / Discretization
distro / Installing the SciPy stack
divisive / Discretization
downhill simplex algorithm / SciPy

E

ElasticNet / Gradient descent
ensemble
- types / Ensemble types
- strategies / Ensemble strategies
- techniques / Other methods
ensemble techniques
- about / Ensemble types
- averaging method / Ensemble types
- boosting method / Ensemble types
entering variable / Linear programming
equal frequency discretization / Discretization
equal width discretization / Discretization
extra trees
- about / Extra trees
ExtraTreesClassifier class / Extra trees
ExtraTreesRegressor class / Extra trees

F

feature
- types / Feature types
- transforming / Transforming features
- discretization / Discretization
- normalization / Normalization
- calibration / Calibration
feature types
- quantitative features / Quantitative features
- ordinal features / Ordinal features
- categorical features / Categorical features
Fourier Transform / Signals
functions, for impurity measures
- Gini index / Purity
- Entropy / Purity

G

global / Gradient descent
gradient boosting / Gradient boosting
gradient checking / Gradient checking

H

Hadoop / Data volume
hamming distance / Ordinal features
human interface
- about / The human interface
hyper parameter / Gradient descent
hypothesis space / Logical models

I

imputation / Calibration
inductive logic programming
- about / Structured features
instance space / Logical models
integrated pest management systems greenhouses
- about / Insect detection in greenhouses
- reviewing / Reviewing the case study
internal disjunction / Version space
IPython console
- about / IPython console
item-based collaborative filtering / Collaborative filtering

J

Jupyter
- reference link / IPython console

K

k-fold cross validation / Evaluating model performance
kernel trick / Features
k nearest neighbours (K-NN)
- about / Scikit-learn
- Sklearn. KNeighboursClassifier / Scikit-learn
- RadiusNeighboursClassifier / Scikit-learn
kurtosis / Operations and statistics

L

L1 norm / Regularization
lambda / Regularization
lambda(λ) / Data distributions
Large Hadron Collider / Data velocity
Lasso / Gradient descent
lasso regression / Regularization
learning curves
- using / Learning curves
least general generalization (LGG) / Generality ordering
least squares
- about / Introducing least squares
- gradient descent / Gradient descent
- normal equation / The normal equation
leaving variable / Linear programming
Linear programming (LP) / Linear programming
local minimum / Gradient descent
logical models
- about / Logical models
- generality ordering / Generality ordering
- version space / Version space
- coverage space / Coverage space
- Probably Approximately Correct (PAC) learning / PAC learning and computational complexity
- computational complexity / PAC learning and computational complexity
logistic calibration / Calibration
logistic function / Logistic regression
logistic regression
- about / Logistic regression
- cost function / The Cost function for logistic regression
logistic units
- about / Logistic units

M

machine learning
- about / Machine learning at a glance
machine learning systems
- activities / Design principles
MapReduce / Data volume
margin / Ensemble strategies
mathematical operations, NumPy
- about / Mathematical operations
- vectors / Mathematical operations
- polynomial functions / Mathematical operations
Matlab / Python for machine learning
Matplotlib
- about / Installing the SciPy stack, Matplotlib
mean (µ (mu)) / Data distributions
meta-estimators / Multiclass classification
meta-learning / Other methods
meta-model / Other methods
MNIST dataset
- reference link / Implementing a neural network
model performance
- evaluating / Evaluating model performance
- measuring, with precision (P) / Evaluating model performance
- measuring, with recall (R) / Evaluating model performance
models
- about / Models
- grouping approach / Models
- geometric models / Geometric models
- probabilistic models / Probabilistic models
- logical models / Logical models
- rule models / Logical models, Rule models
- tree models / Tree models
model selection
- about / Model selection
- Gridsearch, using / Gridsearch
Moore's law / Data volume
Mr Clippy / The human interface
multiclass classification
- about / Multiclass classification
- one versus all technique / Multiclass classification
- one versus one technique / Multiclass classification
multicollinearity / Scikit-learn
MySQL server
- reference link / Data from databases

N

Natural Language Tool Kit (NLTK) / Data from natural language
nelder-mead solver / SciPy
neural net architectures
- about / Other neural net architectures
- deep architecture / Other neural net architectures
- recurrent neural networks (RNNs) / Other neural net architectures
neural network
- starting with / Getting started with neural networks
- implementing / Implementing a neural network
Ngram
- reference link / The human interface
normal equation / The normal equation
normalization / Normalization
NumPy
- about / Installing the SciPy stack, NumPY
- arrays, constructing / Constructing and transforming arrays
- arrays, transforming / Constructing and transforming arrays
- mathematical operations / Mathematical operations

O

one hot encoding / Transforming features
one versus all technique / Multiclass classification
one versus one technique / Multiclass classification
operations / Operations and statistics
ordered list approach / The ordered list approach
ordinal features / Ordinal features
Ordinary Least Squares / Scikit-learn
orthogonal function / Signals

P

packages, SciPy
- cluster / SciPy
- constants / SciPy
- integrate / SciPy
- interpolate / SciPy
- io / SciPy
- optimize / SciPy
- linalg / SciPy
- ndimage / SciPy
- odr / SciPy
- stats / SciPy
Pandas
- about / Pandas
parameters, bagging
- base_estimator / Bagging
- n_estimators / Bagging
- max_samples / Bagging
- max_features / Bagging
- bootstrap / Bagging
- bootstrap_features / Bagging
pdfminer3k / Cleaning data
pdftotext / Cleaning data
pivoting / Linear programming
polynomial regression / Gradient descent
Portable Document Format (PDF) / Cleaning data
principle component analysis (PCA)
- about / Principle component analysis
Principle Component Analysis (PCA) / Features, Scikit-learn
probability density function / Data distributions
Probably Approximately Correct (PAC) learning / PAC learning and computational complexity
Python
- used, for machine learning / Python for machine learning

Q

quantitative features / Quantitative features

R

R / Python for machine learning
random forest
- about / Random forests
real-world case studies
- about / Real-world case studies
- recommender system, building / Building a recommender system
- integrated pest management systems greenhouses / Insect detection in greenhouses
Receiver Operator Characteristic (ROC) / Calibration
recommender system
- building / Building a recommender system
- content-based filtering approach / Content-based filtering
- collaborative filtering approach / Collaborative filtering
- reviewing / Reviewing the case study
recurrent neural networks (RNNs) / Other neural net architectures
regularization
- about / Regularization
Ridge / Gradient descent
ridge regression / Regularization
Rosenbrock function / SciPy
rule models
- purity / Rule models
- ordered list approach / The ordered list approach
- set-based rule models / Set-based rule models

S

sample complexity / PAC learning and computational complexity
scikit-learn
- about / Scikit-learn
SciPy
- about / SciPy
- packages / SciPy
- reference link / SciPy
SciPy stack
- installing / Installing the SciPy stack
- reference link / Installing the SciPy stack
- about / Installing the SciPy stack
segment / Models
set-based rule models / Set-based rule models
SGDClassifier / Gradient descent
SGDRegressor / Gradient descent
shrinkage / Scikit-learn
sigmoid function / Logistic regression
signals
- about / Signals
- data, obtaining from sound / Data from sound
similarity function / Features
skewness / Operations and statistics
stacked generalization / Other methods
stacking / Other methods
standard deviation (δ (sigma)) / Data distributions
standardized features / Calibration
statistics
- about / Operations and statistics
- dispersion / Operations and statistics
- shape / Operations and statistics
- central tendency / Operations and statistics
Stochastic gradient descent / Gradient descent
stratified cross validation / Evaluating model performance
streaming processing / Data velocity
structured features
- about / Structured features
subgroup discovery / Set-based rule models
subspace sampling / Bagging, Random forests
sum of the squared error / Introducing least squares
Support Vector Machines (SVM) / Features

T

task classification
- binary classification / Classification
- multiclass classification / Classification
tasks
- about / Tasks
- classification / Classification
- regression / Regression
- clustering / Clustering
- dimensionality reduction / Dimensionality reduction
- errors / Errors
- optimization problems / Optimization
- linear programming / Linear programming
- models / Models
- features / Features
term-document matrix / Content-based filtering
Thrip / Visualizing data
tree models
- about / Tree models
- purity / Purity

U

unified modelling language (UML)
- about / Unified modeling language
- class diagrams / Class diagrams
- object diagrams / Object diagrams
- activity diagrams / Activity diagrams
- state diagrams / State diagrams

V

version space / Version space

W

weak learnability / Boosting
Whitefly / Visualizing data
WordNet
- reference link / The human interface
world database / Data from databases

Designing Machine Learning Systems with Python

By : David Julian

Designing Machine Learning Systems with Python

By: David Julian

Overview of this book

Related Content you might be interested in

Current Title:

Designing Machine Learning Systems with Python

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W