In this section, we will create a policy network that will take raw pixels from our pong environment that is pong-v0 from OpenAI gym as the input. The policy network is a single hidden layer neural network fully connected to the raw pixels of pong at the input layer and also to the output layer containing a single node returning the probability of the paddle going up. I would like to thank Andrej Karpathy for coming up with a solution to make the agent learn using policy gradients. We will try to implement a similar kind of approach.
A pixel image of size 80*80 in grayscale (we will not use RGB, which would be 80*80*3). Thus, we have a 80*80 grid that is binary and tells us the position of paddles and the ball, which we will feed as an input to the neural network. Thus a neural network would consist of the following:
- Input layer (X): 80*80 squashed to 6400*1 that is 6400 nodes
- Hidden layer: 200 nodes
- Output layer: 1 node
Therefore, the total parameters...