Book Image

Hyperparameter Tuning with Python

By : Louis Owen
Book Image

Hyperparameter Tuning with Python

By: Louis Owen

Overview of this book

Hyperparameters are an important element in building useful machine learning models. This book curates numerous hyperparameter tuning methods for Python, one of the most popular coding languages for machine learning. Alongside in-depth explanations of how each method works, you will use a decision map that can help you identify the best tuning method for your requirements. You’ll start with an introduction to hyperparameter tuning and understand why it's important. Next, you'll learn the best methods for hyperparameter tuning for a variety of use cases and specific algorithm types. This book will not only cover the usual grid or random search but also other powerful underdog methods. Individual chapters are also dedicated to the three main groups of hyperparameter tuning methods: exhaustive search, heuristic search, Bayesian optimization, and multi-fidelity optimization. Later, you will learn about top frameworks like Scikit, Hyperopt, Optuna, NNI, and DEAP to implement hyperparameter tuning. Finally, you will cover hyperparameters of popular algorithms and best practices that will help you efficiently tune your hyperparameter. By the end of this book, you will have the skills you need to take full control over your machine learning models and get the best models for the best results.
Table of Contents (19 chapters)
1
Section 1:The Methods
8
Section 2:The Implementation
13
Section 3:Putting Things into Practice

Discovering LPO cross-validation

LPO cross-validation is a variation of the LOO cross-validation strategy, where the validation set in each fold contains p samples instead of only 1 sample. Similar to LOO, this strategy will ensure that we get all possible combinations of train-validation pairs. To be more precise, there will be number of folds assuming there are n samples on our data. For example, there will be or 142,506 folds if we want to perform Leave-5-Out cross-validation on data that has 50 samples.

LPO is suitable when you have a small number of samples and want to get even higher confidence in the model's estimated performance compared to the LOO method. LPO will result in an exploding number of folds when you have a large number of samples.

This strategy is a bit different from k-fold or LOO in terms of the overlapping between the validation sets. For P > 1, LPO will result in overlapping validation sets, while k-fold and LOO will always result in non-overlapping validation sets. Also, note that LPO is different from k-fold with K = N // P since k-fold will always create non-overlapping validation sets, but not with the LPO strategy:

from sklearn.model_selection import train_test_split, LeavePOut
df_cv, df_test = train_test_split(df, test_size=0.2, random_state=0)
lpo = LeavePOut(p=2)
for train_index, val_index in lpo.split(df_cv):
df_train, df_val = df_cv.iloc[train_index], df_cv.iloc[val_index]
#perform training or hyperparameter tuning here

Unlike LOO, we have to provide the p argument to LPO, which refers to the p values in LPO.

In this section, we have learned about the variations of the LOO cross-validation strategy. In the next section, we will learn how to perform cross-validation on time-series data.