Types of Monte Carlo Methods
We have implemented the game of Blackjack using Monte Carlo. Typically, a trajectory of Monte Carlo is a sequence of state, action, and reward. In several episodes, it is possible that the state repeats. For example, the trajectory could be S0, S1, S2, S0, S3. How do we handle the calculation of the reward function when we have multiple visits to the states?
Broadly, this highlights that there are two types of Monte Carlo methods – first visit and every visit. We will understand the implications of both methods.
As stated previously, in Monte Carlo methods, we approximate the value function by averaging the rewards. In the first visit Monte Carlo method, only the first visit to a state in an episode is included to calculate the average reward. For example, in a given game of traversing a maze, you could make several visits to the sample place. In the first visit Monte Carlo method, only the first visit is used for the calculation of the reward...