In this section, we'll use Q-learning in combination with a simple neural network to control an agent in the cart-pole task. We'll use an ε-greedy policy and experience replay. This is a classic RL problem. The agent must balance a pole attached to the cart via a joint. At every step, the agent can move the cart left or right. It receives a reward of 1 every time step that the pole is balanced. If the pole deviates by more than 15 degrees from upright, the game ends:
To help us with this, we'll use OpenAI Gym (https://gym.openai.com/), which is an open source toolkit for the development and comparison of RL algorithms. It allows us to teach agents over various tasks, such as walking or playing games such as Pong, Pinball, other Atari games, and even Doom.
We can install it with pip:
pip install gym[all]