Book Image

Deep Reinforcement Learning Hands-On

By : Maxim Lapan
Book Image

Deep Reinforcement Learning Hands-On

By: Maxim Lapan

Overview of this book

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.
Table of Contents (23 chapters)
Deep Reinforcement Learning Hands-On
Contributors
Preface
Other Books You May Enjoy
Index

The REINFORCE method


The formula of PG that we’ve just seen is used by most of the policy-based methods, but the details can vary. One very important point is how exactly gradient scales Q(s, a) are calculated. In the cross-entropy method from Chapter 4, The Cross-Entropy Method, we played several episodes, calculated the total reward for each of them, and trained on transitions from episodes with a better-than-average reward. This training procedure is the PG method with Q(s, a) = 1 for actions from good episodes (with large total reward) and Q(s, a) = 0 for actions from worse episodes.

The cross-entropy method worked even with those simple assumptions, but the obvious improvement will be to use Q(s, a) for training instead of just 0 and 1. So why should it help? The answer is a more fine-grained separation of episodes. For example, transitions of the episode with the total reward = 10 should contribute to the gradient more than transitions from the episode with the reward = 1. The second...