We have covered a lot in this chapter and looked at a lot of Python code. We talked a bit about the theory of discrete state and zero sum games. We showed how min-max can be used to evaluate the best moves in positions. We also showed that evaluation functions can be used to allow min-max to operate on games where the state space of possible moves and positions are too vast.
For games where no good evaluation function exists, we showed how Monte-Carlo Tree Search can be used to evaluate the positions and then how Monte-Carlo Tree Search with Upper Confidence bounds for Trees can allow the performance of MCTS to coverage toward what you would get from Min-max. This took us to the UCB1 algorithm. Apart from allowing us to compute MCTS-UCT, it is also a great general purpose method for choosing between collections of unknown outcomes.
We then looked at how reinforcement learning can be integrated into these approaches. We also saw how the policy gradient can be used to train deep networks...