This is about a gridworld environment in OpenAI gym called FrozenLake-v0, discussed in Chapter 2, Training Reinforcement Learning Agents Using OpenAI Gym. We implemented Q-learning and Q-network (which we will discuss in future chapters) to get the understanding of an OpenAI gym environment.
Now, let's try to implement value iteration to obtain the utility value of each state in the FrozenLake-v0 environment, using the following code:
# importing dependency libraries from __future__ import print_function import gym import numpy as np import time #Load the environment env = gym.make('FrozenLake-v0') s = env.reset() print(s) print() env.render() print() print(env.action_space) #number of actions print(env.observation_space) #number of states print() print("Number of actions : ",env.action_space.n) print("Number of states : ",env.observation_space.n) print() # Value Iteration Implementation #Initializing Utilities of all states with zeros...