The whole example is in the
Chapter05/02_frozenlake_q_learning.py file, and the difference is really minor. The most obvious change is to our value table. In the previous example, we kept the value of the state, so the key in the dictionary was just a state. Now we need to store values of the Q-function, which has two parameters: state and action, so the key in the value table is now a composite.
The second difference is in our
calc_action_value function. We just don't need it anymore, as our action values are stored in the value table. Finally, the most important change in the code is in the agent's
value_iteration method. Before, it was just a wrapper around the
calc_action_value call, which did the job of Bellman approximation. Now, as this function has gone and was replaced by a value table, we need to do this approximation in the
Let's look at the code. As it's almost the same, I'll jump directly to the most interesting