We have already discussed the pong environment before in Chapter 4, Policy Gradients. We will use the following code to create the A3C for Pong-v0 in OpenAI gym:
import multiprocessing import threading import tensorflow as tf import numpy as np import gym import os import shutil import matplotlib.pyplot as plt game_env = 'Pong-v0' num_workers = multiprocessing.cpu_count() max_global_episodes = 100000 global_network_scope = 'globalnet' global_iteration_update = 20 gamma = 0.9 beta = 0.0001 lr_actor = 0.0001 # learning rate for actor lr_critic = 0.0001 # learning rate for critic global_running_rate = [] global_episode = 0 env = gym.make(game_env) num_actions = env.action_space.n tf.reset_default_graph()
The input state image preprocessing function:
def preprocessing_image(obs): #where I is the single frame of the game as the input
""" prepro 210x160x3 uint8 frame into 6400 (80x80) 1D float vector """
#the values below have been precomputed through trail...