Book Image

Deep Reinforcement Learning Hands-On

By : Maxim Lapan
Book Image

Deep Reinforcement Learning Hands-On

By: Maxim Lapan

Overview of this book

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.
Table of Contents (23 chapters)
Deep Reinforcement Learning Hands-On
Contributors
Preface
Other Books You May Enjoy
Index

CartPole variance


To check this theoretical conclusion in practice, let's plot the variance of the PG during the training for both the baseline version and the version without the baseline. The complete example is in Chapter10/01_cartpole_pg.py and most of the code is the same as in Chapter 9, Policy Gradients – An Alternative. Differences in this version are the following:

  • It now accepts the command-line option --baseline, which enables the mean subtraction from the reward. By default, no baseline is used.

  • On every training loop, we gather the gradients from the policy loss and use this data to calculate the variance.

To gather only the gradients from the policy loss and exclude the gradients from the entropy bonus added for exploration, we need to calculate the gradients in two stages. Luckily, PyTorch allows this to be done easily. Below, only the relevant part of the training loop is included to illustrate the idea.

        optimizer.zero_grad()
        logits_v = net(states_v)
        log_prob_v...