Book Image

Mastering Predictive Analytics with R

By : Rui Miguel Forte, Rui Miguel Forte
Book Image

Mastering Predictive Analytics with R

By: Rui Miguel Forte, Rui Miguel Forte

Overview of this book

Table of Contents (19 chapters)
Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Cross-validation


We've seen that many times in the real world, we come across a situation where we don't have an available test data set that we can use in order to measure the performance of our model on unseen data. The most typical reason is that we have very few data overall and want to use all of it to train our model. Another situation is that we want to keep a sample of the data as a validation set to tune some model meta parameters such as cost and gamma for SVMs with radial kernels, and as a result, we've already reduced our starting data and don't want to reduce it further.

Whatever the reason for the lack of a test data set, we already know that we should never use our training data as a measure of model performance and generalization because of the problem of overfitting. This is especially relevant for powerful and expressive models such as the nonlinear models of neural networks and SVMs with radial kernels that are often capable of approximating the training data very closely...