Book Image

Hands-On Intelligent Agents with OpenAI Gym

By : Palanisamy P
Book Image

Hands-On Intelligent Agents with OpenAI Gym

By: Palanisamy P

Overview of this book

Many real-world problems can be broken down into tasks that require a series of decisions to be made or actions to be taken. The ability to solve such tasks without a machine being programmed requires a machine to be artificially intelligent and capable of learning to adapt. This book is an easy-to-follow guide to implementing learning algorithms for machine software agents in order to solve discrete or continuous sequential decision making and control tasks. Hands-On Intelligent Agents with OpenAI Gym takes you through the process of building intelligent agent algorithms using deep reinforcement learning starting from the implementation of the building blocks for configuring, training, logging, visualizing, testing, and monitoring the agent. You will walk through the process of building intelligent agents from scratch to perform a variety of tasks. In the closing chapters, the book provides an overview of the latest learning environments and learning algorithms, along with pointers to more resources that will help you take your deep reinforcement learning skills to the next level.
Table of Contents (12 chapters)

SARSA and Q-learning

It is also very useful for an agent to learn the action value function , which informs the agent about the long-term value of taking action in state so that the agent can take those actions that will maximize its expected, discounted future reward. The SARSA and Q-learning algorithms enable an agent to learn that! The following table summarizes the update equation for the SARSA algorithm and the Q-learning algorithm:

Learning method Action-value function

SARSA

Q-learning

SARSA is so named because of the sequence State->Action->Reward->State'->Action' that the algorithm's update step depends on. The description of the sequence goes like this: the agent, in state S, takes an action A and gets a reward R, and ends up in the next state S', after which the agent decides to take an action A' in the new state...