Book Image

Advanced Machine Learning with Python

Book Image

Advanced Machine Learning with Python

Overview of this book

Designed to take you on a guided tour of the most relevant and powerful machine learning techniques in use today by top data scientists, this book is just what you need to push your Python algorithms to maximum potential. Clear examples and detailed code samples demonstrate deep learning techniques, semi-supervised learning, and more - all whilst working with real-world applications that include image, music, text, and financial data. The machine learning techniques covered in this book are at the forefront of commercial practice. They are applicable now for the first time in contexts such as image recognition, NLP and web search, computational creativity, and commercial/financial data modeling. Deep Learning algorithms and ensembles of models are in use by data scientists at top tech and digital companies, but the skills needed to apply them successfully, while in high demand, are still scarce. This book is designed to take the reader on a guided tour of the most relevant and powerful machine learning techniques. Clear descriptions of how techniques work and detailed code examples demonstrate deep learning techniques, semi-supervised learning and more, in real world applications. We will also learn about NumPy and Theano. By this end of this book, you will learn a set of advanced Machine Learning techniques and acquire a broad set of powerful skills in the area of feature selection & feature engineering.

Advanced Machine Learning with Python

Advanced Machine Learning with Python

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Unsupervised Machine Learning

Unsupervised Machine Learning

Principal component analysis

Introducing k-means clustering

Self-organizing maps

Further reading

Deep Belief Networks

Deep Belief Networks

Neural networks – a primer

Restricted Boltzmann Machine

Deep belief networks

Further reading

Stacked Denoising Autoencoders

Stacked Denoising Autoencoders

Stacked Denoising Autoencoders

Further reading

Convolutional Neural Networks

Convolutional Neural Networks

Introducing the CNN

Further Reading

Semi-Supervised Learning

Semi-Supervised Learning

Understanding semi-supervised learning

Semi-supervised algorithms in action

Further reading

Text Feature Engineering

Text Feature Engineering

Text feature engineering

Further reading

Feature Engineering Part II

Feature Engineering Part II

Creating a feature set

Feature engineering in practice

Further reading

Ensemble Methods

Ensemble Methods

Introducing ensembles

Using models in dynamic applications

Further reading

Additional Python Machine Learning Tools

Additional Python Machine Learning Tools

Alternative development tools

Further reading

Chapter Code Requirements

Chapter Code Requirements

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

AdaBoost
- about / Applying boosting methods
Adjusted Rand Index (ARI)
- about / Kick-starting clustering analysis
Area Under the Curve (AUC)
- about / Testing our prepared data
area under the curve (AUC) / Testing the performance of our model
autoencoders
- about / Autoencoders, Introducing the autoencoder
- topology / Topology
- training / Training
- denoising / Denoising autoencoders
averaging ensembles
- about / Understanding averaging ensembles
- bagging algorithms, using / Using bagging algorithms
- random forests, using / Using random forests

B

backoff taggers
- about / Backoff tagging
backoff tagging
- about / Backoff tagging
bagging
- about / Bagging and random forests, Using bagging algorithms
bagging algorithms
- using / Using bagging algorithms
Batch Normalization
- about / Applying a CNN
BeautifulSoup
- text data, cleaning / Text cleaning with BeautifulSoup
Best Matching Unit (BMU)
- about / SOM – a primer
Bing Traffic API
- about / Acquiring data via RESTful APIs, The Bing Traffic API
blend-of-blends / Using stacking ensembles
Blocks / Knowing when to use these libraries
boosting methods
- applying / Applying boosting methods
- Extreme Gradient Boosting (XGBoost), using / Using XGBoost
Borda count / Strategies to managing model robustness
Brill taggers
- about / Backoff tagging

C

carp
- about / Sequential tagging
Champion/Challenger / Strategies to managing model robustness
CIFAR-10 dataset
- about / Understanding pooling layers
clustering
- about / Clustering – a primer
completeness score
- about / Kick-starting clustering analysis
composable layer / Understanding the convnet topology
Contrastive Pessimistic Likelihood Estimation (CPLE)
- about / Introduction, Contrastive Pessimistic Likelihood Estimation
convnet topology
- about / Understanding the convnet topology
- pooling layers / Understanding pooling layers
- training / Training a convnet
- forward pass / Training a convnet
- backward pass / Training a convnet
- implementing / Putting it all together
Convolutional Neural Network (CNN)
- about / Introduction to TensorFlow
convolutional neural networks (CNN)
- about / Introducing the CNN
- convnet topology / Understanding the convnet topology
- convolution layers / Understanding convolution layers
- applying / Applying a CNN
convolution layers
- about / Understanding convolution layers
correlation
- about / Correlation
covariance
- about / PCA – a primer

D

data
- acquiring, via Twitter / Twitter
deep belief network (DBN)
- about / Deep belief networks
- training / Training a DBN
- applying / Applying the DBN
- validating / Validating the DBN
DeepFace
- about / Introducing the CNN
denoising autoencoders (dA)
- about / Denoising autoencoders
- applying / Applying a dA
DepthConcat element
- about / Putting it all together
development tools
- about / Alternative development tools
- Lasagne / Alternative development tools
- TensorFlow / Alternative development tools
- libraries usage, deciding / Knowing when to use these libraries
Diabolo network
- about / Autoencoders
dynamic applications
- models, using / Using models in dynamic applications

E

eigenvalue
- about / PCA – a primer
eigenvector
- about / PCA – a primer
elbow method
- about / Tuning your clustering configurations, Applying boosting methods
ensembles
- about / Introducing ensembles
- averaging ensembles / Understanding averaging ensembles
- boosting methods, applying / Applying boosting methods
- stacking ensembles, using / Using stacking ensembles
- applying / Applying ensembles in practice
Extreme Gradient Boosting (XGBoost)
- using / Using XGBoost
extremely randomized trees (ExtraTrees)
- about / Using random forests

F

Fast Fourier Transform
- about / Training a convnet
feature engineering
- about / Introduction, Feature engineering in practice
- data, acquiring via RESTful APIs / Acquiring data via RESTful APIs
- variables, deriving / Deriving and selecting variables using feature engineering techniques
- variables, selecting / Deriving and selecting variables using feature engineering techniques
- weather API, creating / The weather API
feature engineering, for ML applications
- about / Engineering features for ML applications
- rescaling techniques, using / Using rescaling techniques to improve the learnability of features
- effective derived variables, creating / Creating effective derived variables
- non-numeric features, reinterpreting / Reinterpreting non-numeric features
feature selection
- techniques, using / Using feature selection techniques
- performing / Performing feature selection
- correlation / Correlation
- LASSO / LASSO
- Recursive Feature Elimination (RFE) / Recursive Feature Elimination
- genetic models / Genetic models
feature set
- creating / Creating a feature set
- feature engineering, for ML applications / Engineering features for ML applications
- feature selection techniques, using / Using feature selection techniques
Fisher's discriminant ratio
- about / Finessing your self-training implementation
Fully Connected layer
- about / Putting it all together

G

genetic models
- about / Genetic models
Gibbs sampling
- about / Training
Gini Impurity (gini) / Using stacking ensembles
Go
- about / Introducing the CNN
GoogLeNet / Introducing the CNN
- about / Putting it all together
gradient descent algorithms
- URL / Using rescaling techniques to improve the learnability of features

H

h-dimensional representation / Introducing the autoencoder
heart dataset
- URL / Implementing self-training
hierarchical grouping
- about / Stacked Denoising Autoencoders
homogeneity score
- about / Kick-starting clustering analysis

I

i-dimensional input
- about / Introducing the autoencoder
ImageNet
- about / Introducing the CNN
Inception network
- about / Putting it all together

K

k-means clustering
- about / Introducing k-means clustering
- clustering / Clustering – a primer
- clustering analysis / Kick-starting clustering analysis
- configuration, tuning / Tuning your clustering configurations
K-Nearest Neighbors (KNN) / Using bagging algorithms
Keras / Knowing when to use these libraries

L

Lasagne
- about / Introduction to Lasagne, Getting to know Lasagne
LASSO
- about / LASSO
LeNet
- about / Putting it all together
libraries
- usage, deciding / Knowing when to use these libraries

M

Markov Chain Monte Carlo (MCMC)
- about / Training
max-pooling
- about / Understanding pooling layers
mean-pooling
- about / Understanding pooling layers
modeling risk factors
- longitudinally variant / Identifying modeling risk factors
- slow change / Identifying modeling risk factors
- Key parameter / Identifying modeling risk factors
models
- using, in dynamic applications / Using models in dynamic applications
- robustness / Understanding model robustness
- modeling risk factors, identifying / Identifying modeling risk factors
- robustness, managing / Strategies to managing model robustness
Motor Vehicle Accident (MVA) / Translink Twitter
Multi-Layer Perceptron (MLP)
- about / The composition of a neural network
multicollinearity / Correlation

N

n-dimensional input
- about / Denoising autoencoders
n-gram tagger
- about / Sequential tagging
Natural Language Toolkit (NLTK)
- about / Tagging and categorising words
- used, for tagging / Tagging with NLTK
Network In Network (NIN)
- about / Putting it all together
network topologies
- about / Network topologies
neural networks
- about / Neural networks – a primer
- composition / The composition of a neural network
- learning process / The composition of a neural network
- neurons / The composition of a neural network
- connectivity functions / The composition of a neural network
- network topologies / Network topologies

O

OpinRank Review dataset
- about / Applying the SdA
- URL / Applying the SdA
orthogonalization
- about / PCA – a primer
orthonormalization
- about / PCA – a primer
overcomplete
- about / Denoising autoencoders

P

Permanent Contrastive Divergence (PCD)
- about / Training
Platt calibration
- about / Implementing self-training
pooling layers
- about / Understanding pooling layers
porter stemmer
- about / Stemming
Pragmatic Chaos model / Using stacking ensembles
price-earnings (P/E) ratio / Creating effective derived variables
principal component analysis (PCA)
- about / Principal component analysis
- features / PCA – a primer
- employing / Employing PCA
Pylearn2 / Knowing when to use these libraries

R

random forests
- about / Bagging and random forests, Testing our prepared data
- using / Using random forests
Random Patches
- about / Bagging and random forests
random patches
- about / Using bagging algorithms
random subspaces
- about / Using bagging algorithms
Rectified Linear Units (ReLU)
- about / Putting it all together
Recursive Feature Elimination (RFE) / Performing feature selection
- about / Recursive Feature Elimination
RESTful APIs
- data, acquiring / Acquiring data via RESTful APIs
- model performance, testing / Testing the performance of our model
Restricted Boltzmann Machine (RBM)
- about / Restricted Boltzmann Machine, Introducing the RBM
- topology / Topology
- training / Training
- applications / Applications of the RBM, Further applications of the RBM
Root Mean Squared Error (RMSE)
- about / Genetic models

S

scikit-learn
- about / Employing PCA
Self-Organizing Map (SOM)
- about / The composition of a neural network
self-organizing maps (SOM)
- about / Self-organizing maps, SOM – a primer
- employing / Employing SOM
self-training
- about / Self-training
- implementing / Implementing self-training
- improving / Finessing your self-training implementation
- selection process, improving / Improving the selection process
- Contrastive Pessimistic Likelihood Estimation (CPLE) / Contrastive Pessimistic Likelihood Estimation
semi-supervised algorithms
- using / Semi-supervised algorithms in action
semi-supervised learning
- about / Introduction, Understanding semi-supervised learning
- self-training / Self-training
sequential tagging
- about / Sequential tagging
Silhouette Coefficient
- about / Kick-starting clustering analysis
stacked denoising autoencoders (SdA)
- about / Stacked Denoising Autoencoders
- applying / Applying the SdA
- performance, assessing / Assessing SdA performance
stacking ensembles
- using / Using stacking ensembles
stemming
- about / Stemming
Stochastic Gradient Descent (SGD)
- about / Implementing self-training
stride
- about / Understanding convolution layers
subtaggers
- about / Backoff tagging
sum-pooling
- about / Understanding pooling layers
Support Vector Classification (SVC)
- about / Recursive Feature Elimination

T

tagging
- with, Natural Language Toolkit (NTLK) / Tagging with NLTK
- sequential tagging / Sequential tagging
- backoff tagging / Backoff tagging
TB-scale datasets
- about / Clustering – a primer
tensor / Understanding convolution layers
TensorFlow
- about / Introduction to TensorFlow, Getting to know TensorFlow
- using / Using TensorFlow to iteratively improve our models
TensorFlow library
- about / Understanding convolution layers
text data
- cleaning / Cleaning text data
- cleaning, with BeautifulSoup / Text cleaning with BeautifulSoup
- punctuation, managing / Managing punctuation and tokenizing
- tokenisation, managing / Managing punctuation and tokenizing
- words, categorizing / Tagging and categorising words
- words, tagging / Tagging and categorising words
- features, creating / Creating features from text data
text feature engineering
- about / Text feature engineering
- text data, cleaning / Cleaning text data
- stemming / Stemming
- bagging / Bagging and random forests
- random forests / Bagging and random forests
- prepared data, testing / Testing our prepared data
Theano
- about / Denoising autoencoders
tokenisation
- about / Managing punctuation and tokenizing
transforming autoencoder
- about / Understanding pooling layers
translation-invariance
- about / Understanding pooling layers
Translink Twitter
- about / Translink Twitter
trigram tagger
- about / Sequential tagging
Twitter
- using / Twitter
- Translink Twitter, using / Translink Twitter
- consumer comments, analyzing / Consumer comments
- Bing Traffic API / The Bing Traffic API

U

U-Matrix
- about / Employing SOM
unigram tagger
- about / Sequential tagging

V

v-fold cross-validation
- about / Tuning your clustering configurations
validity measure (v-measure)
- about / Kick-starting clustering analysis

W

weather API
- creating / The weather API

Y

Yahoo Weather API
- about / Acquiring data via RESTful APIs

Z

Zipf distribution
- about / Reinterpreting non-numeric features