The R implementation of some techniques, such as classification and regression trees, performs cross-validation out of the box to aid in model selection and to avoid overfitting. However, some others do not. When faced with several choices of machine learning methods for a particular problem, we can use the standard approach of partitioning the data into training and test sets and select them based on the results. However, cross-validation gives a more thorough evaluation of a model's performance on holdout data. Comparing the performance of methods using cross-validation can paint a truer picture of their relative performance.
Performing k-fold cross-validation
Getting ready
We illustrate the approach with the Boston...