Mastering Predictive Analytics with R

Book Image

Mastering Predictive Analytics with R

By : Rui Miguel Forte, Rui Miguel Forte

Book Image

Mastering Predictive Analytics with R

By: Rui Miguel Forte, Rui Miguel Forte

Overview of this book

Mastering Predictive Analytics with R

Mastering Predictive Analytics with R

Credits

About the Author

About the Author

Acknowledgments

Acknowledgments

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Gearing Up for Predictive Modeling

Gearing Up for Predictive Modeling

Types of models

The process of predictive modeling

Performance metrics

Linear Regression

Linear Regression

Introduction to linear regression

Simple linear regression

Multiple linear regression

Assessing linear regression models

Problems with linear regression

Feature selection

Logistic Regression

Logistic Regression

Classifying with linear regression

Introduction to logistic regression

Predicting heart disease

Assessing logistic regression models

Regularization with the lasso

Classification metrics

Extensions of the binary logistic classifier

Neural Networks

Neural Networks

The biological neuron

The artificial neuron

Stochastic gradient descent

Multilayer perceptron networks

Predicting the energy efficiency of buildings

Predicting glass type revisited

Predicting handwritten digits

Support Vector Machines

Support Vector Machines

Maximal margin classification

Support vector classification

Kernels and support vector machines

Predicting chemical biodegration

Cross-validation

Predicting credit scores

Multiclass classification with support vector machines

Tree-based Methods

Tree-based Methods

The intuition for tree models

Algorithms for training decision trees

Predicting class membership on synthetic 2D data

Predicting the authenticity of banknotes

Predicting complex skill learning

Ensemble Methods

Ensemble Methods

Predicting atmospheric gamma ray radiation

Predicting complex skill learning with boosting

Probabilistic Graphical Models

Probabilistic Graphical Models

A little graph theory

Conditional independence

Bayesian networks

The Naïve Bayes classifier

Hidden Markov models

Predicting promoter gene sequences

Predicting letter patterns in English words

Time Series Analysis

Time Series Analysis

Fundamental concepts of time series

Some fundamental time series

Stationary time series models

Non-stationary time series models

Predicting intense earthquakes

Predicting lynx trappings

Predicting foreign exchange rates

Other time series models

Topic Modeling

An overview of topic modeling

Latent Dirichlet Allocation

Modeling the topics of online news stories

Recommendation Systems

Recommendation Systems

Collaborative filtering

Singular value decomposition

Predicting recommendations for movies and jokes

Loading and preprocessing the data

Exploring the data

Other approaches to recommendation systems

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Cross-validation

We've seen that many times in the real world, we come across a situation where we don't have an available test data set that we can use in order to measure the performance of our model on unseen data. The most typical reason is that we have very few data overall and want to use all of it to train our model. Another situation is that we want to keep a sample of the data as a validation set to tune some model meta parameters such as cost and gamma for SVMs with radial kernels, and as a result, we've already reduced our starting data and don't want to reduce it further.

Whatever the reason for the lack of a test data set, we already know that we should never use our training data as a measure of model performance and generalization because of the problem of overfitting. This is especially relevant for powerful and expressive models such as the nonlinear models of neural networks and SVMs with radial kernels that are often capable of approximating the training data very closely...