AdaBoost can also be interpreted as a stagewise forward approach to minimizing an exponential loss function for a binary y ∈ [-1, 1] at each iteration m to identify a new base learner hm with the corresponding weight αm to be added to the ensemble, as shown in the following formula:
This interpretation of the AdaBoost algorithm was only discovered several years after its publication. It views AdaBoost as a coordinate-based gradient descent algorithm that minimizes a particular loss function, namely exponential loss.
Gradient boosting leverages this insight and applies the boosting method to a much wider range of loss functions. The method enables the design of machine learning algorithms to solve any regression, classification, or ranking problem as long as it can be formulated using a loss function that is differentiable and thus has a...