Machine Learning Quick Reference

Machine Learning Quick Reference

By : Rahul Kumar

Buy this Book

Machine Learning Quick Reference

By: Rahul Kumar

Buy this Book

Overview of this book

Machine learning makes it possible to learn about the unknowns and gain hidden insights into your datasets by mastering many tools and techniques. This book guides you to do just that in a very compact manner. After giving a quick overview of what machine learning is all about, Machine Learning Quick Reference jumps right into its core algorithms and demonstrates how they can be applied to real-world scenarios. From model evaluation to optimizing their performance, this book will introduce you to the best practices in machine learning. Furthermore, you will also look at the more advanced aspects such as training neural networks and work with different kinds of data, such as text, time-series, and sequential data. Advanced methods and techniques such as causal inference, deep Gaussian processes, and more are also covered. By the end of this book, you will be able to train fast, accurate machine learning models at your fingertips, which you can easily use as a point of reference.

Title Page

About Packt

Contributors

Preface

Free Chapter

Quantifying Learning Algorithms

Statistical models

Learning curve

Curve fitting

Statistical modeling – the two cultures of Leo Breiman

Training data development data – test data

Bias-variance trade off

Regularization

Cross-validation and model selection

Model selection using cross-validation

0.632 rule in bootstrapping

Model evaluation

Receiver operating characteristic curve

H-measure

Dimensionality reduction

Summary

Evaluating Kernel Learning

Introduction to vectors

SVM

SVM example and parameter optimization through grid search

Summary

Performance in Ensemble Learning

What is ensemble learning?

Bagging

Decision tree

Random forest algorithm

Boosting

Summary

Training Neural Networks

Neural networks

Network initialization

Overfitting

Prevention of overfitting in NNs

Vanishing gradient

Recurrent neural networks

Summary

Time Series Analysis

Introduction to time series analysis

Autoregressive integrated moving average

Optimization of parameters

Anomaly detection

Summary

Natural Language Processing

TF-IDF

Summary

Temporal and Sequential Pattern Discovery

Association rules

Apriori algorithm

Frequent pattern growth

Summary

Probabilistic Graphical Models

Key concepts

Bayes rule

Bayes network

Summary

Selected Topics in Deep Learning

Deep neural networks

Backward propagation

Forward propagation equation

Backward propagation equation

Parameters and hyperparameters

Bias initialization

Generative adversarial networks

Hinton's Capsule network

Summary

Causal Inference

Granger causality

F-test

Graphical causal models

Summary

Advanced Methods

Introduction

Kernel PCA

Independent component analysis

Compressed sensing

Self-organizing maps

Bayesian multiple imputation

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

H-measure

Binary classification has to apply techniques so that it can map independent variables to different labels. For example, a number of variables exist such as gender, income, number of existing loans, and payment on time/not, that get mapped to yield a score that helps us classify the customers into good customers (more propensity to pay) and bad customers.

Typically, everyone seems to be caught up with the misclassification rate or derived form since the area under curve (AUC) is known to be the best evaluator of our classification model. You get this rate by dividing the total number of misclassified examples by the total number of examples. But does this give us a fair assessment? Let's see. Here, we have a misclassification rate that keeps something important under wraps. More often than not, classifiers come up with a tuning parameter, the side effect of which tends to be favoring false positives over false negatives, or vice versa. Also, picking the AUC as sole model evaluator can act as a double whammy for us. AUC has got different misclassification costs for different classifiers, which is not desirable. This means that using this is equivalent to using different metrics to evaluate different classification rules.

As we have already discussed, the real test of any classifier takes place on the unseen data, and this takes a toll on the model by some decimal points. Adversely, if we have got scenarios like the preceding one, the decision support system will not be able to perform well. It will start producing misleading results.

H-measure overcomes the situation of incurring different misclassification costs for different classifiers. It needs a severity ratio as input, which examines how much more severe misclassifying a class 0 instance is than misclassifying a class 1 instance:

Severity Ratio = cost_0/cost_1

Here, cost_0 > 0 is the cost of misclassifying a class 0 datapoint as class 1.

It is sometimes more convenient to consider the normalized cost c = cost_0/(cost_0 + cost_1) instead. For example, severity.ratio = 2 implies that a false positive costs twice as much as a false negative.

Machine Learning Quick Reference

By : Rahul Kumar

Machine Learning Quick Reference

By: Rahul Kumar

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Quick Reference

Practical Time Series Analysis

Ensemble Machine Learning Cookbook

Hands-On Python for Finance

H-measure