Book Image

Advanced Machine Learning with Python

Book Image

Advanced Machine Learning with Python

Overview of this book

Designed to take you on a guided tour of the most relevant and powerful machine learning techniques in use today by top data scientists, this book is just what you need to push your Python algorithms to maximum potential. Clear examples and detailed code samples demonstrate deep learning techniques, semi-supervised learning, and more - all whilst working with real-world applications that include image, music, text, and financial data. The machine learning techniques covered in this book are at the forefront of commercial practice. They are applicable now for the first time in contexts such as image recognition, NLP and web search, computational creativity, and commercial/financial data modeling. Deep Learning algorithms and ensembles of models are in use by data scientists at top tech and digital companies, but the skills needed to apply them successfully, while in high demand, are still scarce. This book is designed to take the reader on a guided tour of the most relevant and powerful machine learning techniques. Clear descriptions of how techniques work and detailed code examples demonstrate deep learning techniques, semi-supervised learning and more, in real world applications. We will also learn about NumPy and Theano. By this end of this book, you will learn a set of advanced Machine Learning techniques and acquire a broad set of powerful skills in the area of feature selection & feature engineering.
Table of Contents (17 chapters)
Advanced Machine Learning with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Chapter Code Requirements
Index

Index

A

  • AdaBoost
    • about / Applying boosting methods
  • Adjusted Rand Index (ARI)
    • about / Kick-starting clustering analysis
  • Area Under the Curve (AUC)
    • about / Testing our prepared data
  • area under the curve (AUC) / Testing the performance of our model
  • autoencoders
    • about / Autoencoders, Introducing the autoencoder
    • topology / Topology
    • training / Training
    • denoising / Denoising autoencoders
  • averaging ensembles
    • about / Understanding averaging ensembles
    • bagging algorithms, using / Using bagging algorithms
    • random forests, using / Using random forests

B

  • backoff taggers
    • about / Backoff tagging
  • backoff tagging
    • about / Backoff tagging
  • bagging
    • about / Bagging and random forests, Using bagging algorithms
  • bagging algorithms
    • using / Using bagging algorithms
  • Batch Normalization
    • about / Applying a CNN
  • BeautifulSoup
    • text data, cleaning / Text cleaning with BeautifulSoup
  • Best Matching Unit (BMU)
    • about / SOM – a primer
  • Bing Traffic API
    • about / Acquiring data via RESTful APIs, The Bing Traffic API
  • blend-of-blends / Using stacking ensembles
  • Blocks / Knowing when to use these libraries
  • boosting methods
    • applying / Applying boosting methods
    • Extreme Gradient Boosting (XGBoost), using / Using XGBoost
  • Borda count / Strategies to managing model robustness
  • Brill taggers
    • about / Backoff tagging

C

  • carp
    • about / Sequential tagging
  • Champion/Challenger / Strategies to managing model robustness
  • CIFAR-10 dataset
    • about / Understanding pooling layers
  • clustering
    • about / Clustering – a primer
  • completeness score
    • about / Kick-starting clustering analysis
  • composable layer / Understanding the convnet topology
  • Contrastive Pessimistic Likelihood Estimation (CPLE)
    • about / Introduction, Contrastive Pessimistic Likelihood Estimation
  • convnet topology
    • about / Understanding the convnet topology
    • pooling layers / Understanding pooling layers
    • training / Training a convnet
    • forward pass / Training a convnet
    • backward pass / Training a convnet
    • implementing / Putting it all together
  • Convolutional Neural Network (CNN)
    • about / Introduction to TensorFlow
  • convolutional neural networks (CNN)
    • about / Introducing the CNN
    • convnet topology / Understanding the convnet topology
    • convolution layers / Understanding convolution layers
    • applying / Applying a CNN
  • convolution layers
    • about / Understanding convolution layers
  • correlation
    • about / Correlation
  • covariance
    • about / PCA – a primer

D

  • data
    • acquiring, via Twitter / Twitter
  • deep belief network (DBN)
    • about / Deep belief networks
    • training / Training a DBN
    • applying / Applying the DBN
    • validating / Validating the DBN
  • DeepFace
    • about / Introducing the CNN
  • denoising autoencoders (dA)
    • about / Denoising autoencoders
    • applying / Applying a dA
  • DepthConcat element
    • about / Putting it all together
  • development tools
    • about / Alternative development tools
    • Lasagne / Alternative development tools
    • TensorFlow / Alternative development tools
    • libraries usage, deciding / Knowing when to use these libraries
  • Diabolo network
    • about / Autoencoders
  • dynamic applications
    • models, using / Using models in dynamic applications

E

  • eigenvalue
    • about / PCA – a primer
  • eigenvector
    • about / PCA – a primer
  • elbow method
    • about / Tuning your clustering configurations, Applying boosting methods
  • ensembles
    • about / Introducing ensembles
    • averaging ensembles / Understanding averaging ensembles
    • boosting methods, applying / Applying boosting methods
    • stacking ensembles, using / Using stacking ensembles
    • applying / Applying ensembles in practice
  • Extreme Gradient Boosting (XGBoost)
    • using / Using XGBoost
  • extremely randomized trees (ExtraTrees)
    • about / Using random forests

F

  • Fast Fourier Transform
    • about / Training a convnet
  • feature engineering
    • about / Introduction, Feature engineering in practice
    • data, acquiring via RESTful APIs / Acquiring data via RESTful APIs
    • variables, deriving / Deriving and selecting variables using feature engineering techniques
    • variables, selecting / Deriving and selecting variables using feature engineering techniques
    • weather API, creating / The weather API
  • feature engineering, for ML applications
    • about / Engineering features for ML applications
    • rescaling techniques, using / Using rescaling techniques to improve the learnability of features
    • effective derived variables, creating / Creating effective derived variables
    • non-numeric features, reinterpreting / Reinterpreting non-numeric features
  • feature selection
    • techniques, using / Using feature selection techniques
    • performing / Performing feature selection
    • correlation / Correlation
    • LASSO / LASSO
    • Recursive Feature Elimination (RFE) / Recursive Feature Elimination
    • genetic models / Genetic models
  • feature set
    • creating / Creating a feature set
    • feature engineering, for ML applications / Engineering features for ML applications
    • feature selection techniques, using / Using feature selection techniques
  • Fisher's discriminant ratio
    • about / Finessing your self-training implementation
  • Fully Connected layer
    • about / Putting it all together

G

  • genetic models
    • about / Genetic models
  • Gibbs sampling
    • about / Training
  • Gini Impurity (gini) / Using stacking ensembles
  • Go
    • about / Introducing the CNN
  • GoogLeNet / Introducing the CNN
    • about / Putting it all together
  • gradient descent algorithms
    • URL / Using rescaling techniques to improve the learnability of features

H

  • h-dimensional representation / Introducing the autoencoder
  • heart dataset
    • URL / Implementing self-training
  • hierarchical grouping
    • about / Stacked Denoising Autoencoders
  • homogeneity score
    • about / Kick-starting clustering analysis

I

  • i-dimensional input
    • about / Introducing the autoencoder
  • ImageNet
    • about / Introducing the CNN
  • Inception network
    • about / Putting it all together

K

  • k-means clustering
    • about / Introducing k-means clustering
    • clustering / Clustering – a primer
    • clustering analysis / Kick-starting clustering analysis
    • configuration, tuning / Tuning your clustering configurations
  • K-Nearest Neighbors (KNN) / Using bagging algorithms
  • Keras / Knowing when to use these libraries

L

  • Lasagne
    • about / Introduction to Lasagne, Getting to know Lasagne
  • LASSO
    • about / LASSO
  • LeNet
    • about / Putting it all together
  • libraries
    • usage, deciding / Knowing when to use these libraries

M

  • Markov Chain Monte Carlo (MCMC)
    • about / Training
  • max-pooling
    • about / Understanding pooling layers
  • mean-pooling
    • about / Understanding pooling layers
  • modeling risk factors
    • longitudinally variant / Identifying modeling risk factors
    • slow change / Identifying modeling risk factors
    • Key parameter / Identifying modeling risk factors
  • models
    • using, in dynamic applications / Using models in dynamic applications
    • robustness / Understanding model robustness
    • modeling risk factors, identifying / Identifying modeling risk factors
    • robustness, managing / Strategies to managing model robustness
  • Motor Vehicle Accident (MVA) / Translink Twitter
  • Multi-Layer Perceptron (MLP)
    • about / The composition of a neural network
  • multicollinearity / Correlation

N

  • n-dimensional input
    • about / Denoising autoencoders
  • n-gram tagger
    • about / Sequential tagging
  • Natural Language Toolkit (NLTK)
    • about / Tagging and categorising words
    • used, for tagging / Tagging with NLTK
  • Network In Network (NIN)
    • about / Putting it all together
  • network topologies
    • about / Network topologies
  • neural networks
    • about / Neural networks – a primer
    • composition / The composition of a neural network
    • learning process / The composition of a neural network
    • neurons / The composition of a neural network
    • connectivity functions / The composition of a neural network
    • network topologies / Network topologies

O

  • OpinRank Review dataset
    • about / Applying the SdA
    • URL / Applying the SdA
  • orthogonalization
    • about / PCA – a primer
  • orthonormalization
    • about / PCA – a primer
  • overcomplete
    • about / Denoising autoencoders

P

  • Permanent Contrastive Divergence (PCD)
    • about / Training
  • Platt calibration
    • about / Implementing self-training
  • pooling layers
    • about / Understanding pooling layers
  • porter stemmer
    • about / Stemming
  • Pragmatic Chaos model / Using stacking ensembles
  • price-earnings (P/E) ratio / Creating effective derived variables
  • principal component analysis (PCA)
    • about / Principal component analysis
    • features / PCA – a primer
    • employing / Employing PCA
  • Pylearn2 / Knowing when to use these libraries

R

  • random forests
    • about / Bagging and random forests, Testing our prepared data
    • using / Using random forests
  • Random Patches
    • about / Bagging and random forests
  • random patches
    • about / Using bagging algorithms
  • random subspaces
    • about / Using bagging algorithms
  • Rectified Linear Units (ReLU)
    • about / Putting it all together
  • Recursive Feature Elimination (RFE) / Performing feature selection
    • about / Recursive Feature Elimination
  • RESTful APIs
    • data, acquiring / Acquiring data via RESTful APIs
    • model performance, testing / Testing the performance of our model
  • Restricted Boltzmann Machine (RBM)
    • about / Restricted Boltzmann Machine, Introducing the RBM
    • topology / Topology
    • training / Training
    • applications / Applications of the RBM, Further applications of the RBM
  • Root Mean Squared Error (RMSE)
    • about / Genetic models

S

  • scikit-learn
    • about / Employing PCA
  • Self-Organizing Map (SOM)
    • about / The composition of a neural network
  • self-organizing maps (SOM)
    • about / Self-organizing maps, SOM – a primer
    • employing / Employing SOM
  • self-training
    • about / Self-training
    • implementing / Implementing self-training
    • improving / Finessing your self-training implementation
    • selection process, improving / Improving the selection process
    • Contrastive Pessimistic Likelihood Estimation (CPLE) / Contrastive Pessimistic Likelihood Estimation
  • semi-supervised algorithms
    • using / Semi-supervised algorithms in action
  • semi-supervised learning
    • about / Introduction, Understanding semi-supervised learning
    • self-training / Self-training
  • sequential tagging
    • about / Sequential tagging
  • Silhouette Coefficient
    • about / Kick-starting clustering analysis
  • stacked denoising autoencoders (SdA)
    • about / Stacked Denoising Autoencoders
    • applying / Applying the SdA
    • performance, assessing / Assessing SdA performance
  • stacking ensembles
    • using / Using stacking ensembles
  • stemming
    • about / Stemming
  • Stochastic Gradient Descent (SGD)
    • about / Implementing self-training
  • stride
    • about / Understanding convolution layers
  • subtaggers
    • about / Backoff tagging
  • sum-pooling
    • about / Understanding pooling layers
  • Support Vector Classification (SVC)
    • about / Recursive Feature Elimination

T

  • tagging
    • with, Natural Language Toolkit (NTLK) / Tagging with NLTK
    • sequential tagging / Sequential tagging
    • backoff tagging / Backoff tagging
  • TB-scale datasets
    • about / Clustering – a primer
  • tensor / Understanding convolution layers
  • TensorFlow
    • about / Introduction to TensorFlow, Getting to know TensorFlow
    • using / Using TensorFlow to iteratively improve our models
  • TensorFlow library
    • about / Understanding convolution layers
  • text data
    • cleaning / Cleaning text data
    • cleaning, with BeautifulSoup / Text cleaning with BeautifulSoup
    • punctuation, managing / Managing punctuation and tokenizing
    • tokenisation, managing / Managing punctuation and tokenizing
    • words, categorizing / Tagging and categorising words
    • words, tagging / Tagging and categorising words
    • features, creating / Creating features from text data
  • text feature engineering
    • about / Text feature engineering
    • text data, cleaning / Cleaning text data
    • stemming / Stemming
    • bagging / Bagging and random forests
    • random forests / Bagging and random forests
    • prepared data, testing / Testing our prepared data
  • Theano
    • about / Denoising autoencoders
  • tokenisation
    • about / Managing punctuation and tokenizing
  • transforming autoencoder
    • about / Understanding pooling layers
  • translation-invariance
    • about / Understanding pooling layers
  • Translink Twitter
    • about / Translink Twitter
  • trigram tagger
    • about / Sequential tagging
  • Twitter
    • using / Twitter
    • Translink Twitter, using / Translink Twitter
    • consumer comments, analyzing / Consumer comments
    • Bing Traffic API / The Bing Traffic API

U

  • U-Matrix
    • about / Employing SOM
  • unigram tagger
    • about / Sequential tagging

V

  • v-fold cross-validation
    • about / Tuning your clustering configurations
  • validity measure (v-measure)
    • about / Kick-starting clustering analysis

W

  • weather API
    • creating / The weather API

Y

  • Yahoo Weather API
    • about / Acquiring data via RESTful APIs

Z

  • Zipf distribution
    • about / Reinterpreting non-numeric features