Deep Deterministic Policy Gradients
In this section, we will apply the DDPG technique to understand the continuous action space. Moreover, we will learn how to code a moon lander simulation to understand DDPGs.
Note
We suggest that you type all the code given in this section into your Jupyter notebook as we will be using it later, in Exercise 11.02, Creating a Learning Agent.
We are going to use the OpenAI Gym Lunar Lander environment for continuous action spaces here. Let's start by importing the essentials:
import os import gym import torch as T import numpy as np
Now, we will learn how to define some classes, such as the OUActionNoise
class, the ReplayBuffer
class, the ActorNetwork
class, and the CriticNetwork
class, which will help us to implement the DDGP technique. At the end of this section, you'll have the complete code base that applies the DDPG within our OpenAI Gym game environment.
Ornstein-Uhlenbeck Noise
First, we will define...