Book Image

Deep Reinforcement Learning Hands-On

By : Maxim Lapan
Book Image

Deep Reinforcement Learning Hands-On

By: Maxim Lapan

Overview of this book

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.
Table of Contents (23 chapters)
Deep Reinforcement Learning Hands-On
Contributors
Preface
Other Books You May Enjoy
Index

Introduction


The overall motivation of the methods that we'll take a look at is to improve the stability of the policy update during the training. Intuitively, there is a dilemma: on the one hand, we'd like to train as fast as we can, making large steps during the Stochastic Gradient Descent (SGD) update. On the other hand, a large update of the policy is usually a bad idea, as our policy is a very nonlinear thing, so a large update can ruin the policy we've just learned. Things can become even worse in the RL landscape, as making a bad update of the policy once won't be recovered by subsequent updates. Instead, the bad policy will bring us bad experience samples that we'll use on subsequent training steps, which could break our policy completely. Thus, we want to avoid making large updates by all means possible. One of the naive solutions would be to use a small learning rate to make baby steps during the SGD, but this would significantly slow down the convergence.

To break this vicious...