Book Image

Deep Reinforcement Learning Hands-On

By : Maxim Lapan
Book Image

Deep Reinforcement Learning Hands-On

By: Maxim Lapan

Overview of this book

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.
Table of Contents (23 chapters)
Deep Reinforcement Learning Hands-On
Contributors
Preface
Other Books You May Enjoy
Index

The AlphaGo Zero method


Overview

At a high level, the method consists of three components, all of which will be explained in detail later, so don't worry if something is not completely clear from this section:

  • We traverse constantly the game tree, using the Monte-Carlo Tree Search (MCTS) algorithm, the core idea of which is to semi-randomly walk down the game states, expanding them and gathering statistics about the frequency of moves and underlying game outcomes. As the game tree is huge, both in terms of the depth and width, we're not trying to build the full tree, just randomly sampling the most promising paths of it (that's the source of the method's name).

  • At every moment, we have a best player, which is the model used to generate the data via the self-play. Initially, this model has random weights, so it makes the moves randomly, like a four-year-old just learning how chess pieces move. However, over time, we'll replace this best player with better variations of it, which will generate...