The Explore-then-Commit Algorithm
We mentioned that a potential reason for poor performance of the Greedy algorithm in some cases is committing too early when a sufficient number of sample rewards from each arm have not been observed. The Explore-then-commit algorithm attempts to address this problem by formalizing the number of rounds that should be spent exploring each arm at the beginning of the process.
Specifically, each Explore-then-commit algorithm is parameterized by a number, T. In each bandit problem, an Explore-then-commit algorithm will spend exactly T rounds pulling each of the available arms. Only after these forced exploration rounds does the algorithm start choosing the arm with the greatest reward average. Greedy is a special case of the Explore-then-commit algorithm where T is set to 1. This general algorithm, therefore, gives us the option to customize this parameter and set it appropriately, depending on the situation.
The implementation of this algorithm...