Understanding TPE
TPE is another variant of BO that performs well in general and can be utilized for both categorical and continuous types of hyperparameters. Unlike BOGP, which has cubical time complexity, TPE runs in linear time. TPE is suggested if you have a huge hyperparameter space and have a very tight budget for evaluating the cross-validation score.
The main difference between TPE and BOGP or SMAC is in the way that it models the relationship between hyperparameters and the cross-validation score. Unlike BOGP or SMAC, which approximate the value of the objective function, or the posterior probability, , TPE works the other way around. It tries to get the optimal hyperparameters based on the condition of the objective function, or the likelihood probability, (see the explanation of Bayes Theorem in the Understanding BO GP section).
In other words, unlike BOGP or SMAC, which construct a predictive distribution over the objective function, TPE tries to utilize the information...