The complete example is in
Chapter16/01_cartpole_es.py. In this example, we use the single environment to check the fitness of the perturbed network weights. Our fitness function will be the undiscounted total reward for the episode:
#!/usr/bin/env python3 import gym import time import numpy as np import torch import torch.nn as nn from tensorboardX import SummaryWriter
import statements, you can notice how self-contained our example is. We're not using PyTorch optimizers, as we do not perform backpropagation at all. In fact, we could avoid using PyTorch completely and work only with NumPy, as the only thing we use PyTorch for is to perform a forward pass and calculate the network's output.
MAX_BATCH_EPISODES = 100 MAX_BATCH_STEPS = 10000 NOISE_STD = 0.01 LEARNING_RATE = 0.001
The amount of hyperparameters is also small and includes the following values:
MAX_BATCH_STEPS: The limit of episodes and steps we use for training
NOISE_STD: The standard...