Cross validation
Cross validation is one of the most underrated processes in the domain of data science and analytics. However, it is very popular among the practitioners of competitive data science. It is a model evaluation method. It can give the analyst an idea about how well the model would perform on new predictions that the model has not yet seen. It is also extensively used to gauge and avoid the problem of overfitting, which occurs due to an excessive precise fit on the training set leading to inaccurate or high-error predictions on the testing set.
Getting ready
To get ready, the MLBase
library has to be installed and imported. So, as we already installed it for the Preprocessing recipe, we don't need to install it again. Instead, we can directly import it using the using MLBase
command. This can be done as follows:
using MLBase
How to do it...
Firstly, we will look at the k-fold cross-validation method, which is one of the most popular cross validation methods used. The input data...