The source code is in
Chapter16/03_cartpole_ga.py and it has lots in common with our ES example. The difference is in the lack of the gradient ascent code, which was replaced by the network mutation function as follows:
def mutate_parent(net): new_net = copy.deepcopy(net) for p in new_net.parameters(): noise_t = torch.from_numpy(np.random.normal(size=p.data.size()).astype(np.float32)) p.data += NOISE_STD * noise_t return new_net
The goal of the function is to create a mutated copy of the given policy by adding a random noise to all weights. The parent's weights are kept untouched, as a random selection of the parent is performed with replacement, so this network could be used again later.
NOISE_STD = 0.01 POPULATION_SIZE = 50 PARENTS_COUNT = 10
The count of hyperparameters is even smaller than with ES and includes the standard deviation of the noise added-on mutation, the population size, and the number of top performers used to produce the subsequent...