Chapter 3
Robot Control System Using Deep Reinforcement Learning
Section 8
Q-learning Solution
Now we have to face the most demanding phase: training of our system. In the Q-learning section, we learnt that the Gym library is focused on the episodic setting of reinforcement learning. The agent's experience is divided into a series of episodes. The initial state of the agent is randomly sampled by a distribution, and the interaction proceeds until the environment reaches a terminal state. This procedure is repeated for each episode, with the aim of maximizing the total reward expectation per episode and achieving a high level of performance in the fewest possible episodes.