Understanding hyperparameter tuning
When first faced with a long list of model training parameters and their possible values, you might think that in order to successfully train a model, you need a special superpower that helps you pick the right parameter for the right scenario. This isn't necessarily true. While experience may help you narrow down the set of possible hyperparameters, there usually isn't a reliable way of knowing with certainty what the best hyperparameter is in advance.
Let's imagine the simplest possible scenario – a sequence tagging model trainer that receives a single parameter – say, a learning rate. This is generally a value between 0 (exclusive) and 1. To create a set of possible hyperparameter values, we simply discretize the range into a set of 10 possible hyperparameter values: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
. We can then perform the most trivial type of hyperparameter optimization by training 10 different...