As we saw in the previous chapter, there are several entities in RL's view of the world:
Agent: A person or a thing that takes an active role. In practice, it's some piece of code, which implements some policy. Basically, this policy must decide what action is needed at every time step, given our observations.
Let's show how both of them can be implemented in Python for a simplistic situation. We will define an environment that gives the agent random rewards for a limited number of steps, regardless of the agent's actions. This scenario is not very useful, but will allow us to focus on specific methods in both the environment and the agent classes. Let's start with the environment:
class Environment: def __init__(self): self.steps_left = 10
In the preceding code...