In the following code example we implement a simple MDP:
import numpy as np
Defines a Markov Decision Process containing:
- States, s
- Actions, a
- Rewards, r(s,a)
- Transition Matrix, t(s,a,_s)
Includes a set of abstract methods for extended class will
need to implement.
def __init__(self, states=None, actions=None, rewards=None, transitions=None,
discount=.99, tau=.01, epsilon=.01):
states: 1-D array
The states of the environment
actions: 1-D array
The possible actions by the agent.
rewards: 2-D array
The rewards corresponding to each action at each state of the environment.
transitions: 2-D array
The transition probabilities between the states of the environment.