Discovering Leave-One-Out cross-validation
Essentially, Leave One Out (LOO) cross-validation is just k-fold cross-validation where k = n, where n is the number of samples. This means there are n-1 samples for the training set and 1 sample for the validation set in each fold (see Figure 1.3). Undoubtedly, this is a very computationally expensive strategy and will result in a very high variance evaluation score estimator:
Figure 1.3 – LOO cross-validation
So, when is LOO preferred over k-fold cross-validation? Well, LOO works best when you have a very small dataset. It is also good to choose LOO over k-fold if you prefer the high confidence of the model's performance estimation over the computational cost limitation.
Implementing this strategy from scratch is actually very simple. We just need to loop through each of the indexes of data and do some data manipulation. However, the Scikit-Learn package also provides the implementation for LOO, which we can use:
from sklearn.model_selection import train_test_split, LeaveOneOut df_cv, df_test = train_test_split(df, test_size=0.2, random_state=0) loo = LeaveOneOut() for train_index, val_index in loo.split(df_cv): df_train, df_val = df_cv.iloc[train_index], df_cv.iloc[val_index] #perform training or hyperparameter tuning here
Notice that there is no argument provided in the LeaveOneOut
function since this strategy is very straightforward and involves no stochastic procedure. There is also no stratified version of the LOO since the validation set will always contain one sample.
Now that you are aware of the concept of LOO, in the next section, we will learn about a slight variation of LOO.