The AlphaGo Zero method
- We constantly traverse the game tree using the Monte Carlo tree search (MCTS) algorithm, the core idea of which is to semi-randomly walk down the game states, expanding them and gathering statistics about the frequency of moves and underlying game outcomes. As the game tree is huge, both in terms of the depth and width, we don't try to build the full tree; we just randomly sample its most promising paths (that's the source of the method's name).
- At every moment, we have a best player, which is the model used to generate the data via self-play. Initially, this model has random weights...