Reinforcement Learning with TensorFlow

Book Image

Reinforcement Learning with TensorFlow

By : Sayon Dutta

Book Image

Reinforcement Learning with TensorFlow

By: Sayon Dutta

Overview of this book

Reinforcement learning (RL) allows you to develop smart, quick and self-learning systems in your business surroundings. It's an effective method for training learning agents and solving a variety of problems in Artificial Intelligence - from games, self-driving cars and robots, to enterprise applications such as data center energy saving (cooling data centers) and smart warehousing solutions. The book covers major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You'll also be introduced to the concept of reinforcement learning, its advantages and the reasons why it's gaining so much popularity. You'll explore MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, and temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP. By the end of this book, you will have gained a firm understanding of what reinforcement learning is and understand how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Deep Learning – Architectures and Frameworks

Deep Learning – Architectures and Frameworks

Reinforcement learning

Introduction to TensorFlow and OpenAI Gym

The pioneers and breakthroughs in reinforcement learning

Training Reinforcement Learning Agents Using OpenAI Gym

Training Reinforcement Learning Agents Using OpenAI Gym

Programming an agent using an OpenAI Gym environment

Markov Decision Process

Markov Decision Process

Markov decision processes

Partially observable Markov decision processes

Training the FrozenLake-v0 environment using MDP

Policy Gradients

Policy Gradients

The policy optimization method

Why policy optimization methods?

Policy objective functions

Temporal difference rule

Policy gradients

Agent learning pong using policy gradients

Q-Learning and Deep Q-Networks

Q-Learning and Deep Q-Networks

Why reinforcement learning?

Model based learning and model free learning

Deep Q-networks

The Monte Carlo tree search algorithm

The SARSA algorithm

Asynchronous Methods

Asynchronous Methods

Why asynchronous methods?

Asynchronous one-step Q-learning

Asynchronous one-step SARSA

Asynchronous n-step Q-learning

Asynchronous advantage actor critic

A3C for Pong-v0 in OpenAI gym

Robo Everything – Real Strategy Gaming

Robo Everything – Real Strategy Gaming

Real-time strategy games

Reinforcement learning and other approaches

Reinforcement learning in RTS gaming

AlphaGo – Reinforcement Learning at Its Best

AlphaGo – Reinforcement Learning at Its Best

AlphaGo – mastering Go

Reinforcement Learning in Autonomous Driving

Reinforcement Learning in Autonomous Driving

Machine learning for autonomous driving

Reinforcement learning for autonomous driving

Proposed frameworks for autonomous driving

DeepTraffic – MIT simulator for autonomous driving

Financial Portfolio Management

Financial Portfolio Management

Problem definition

Data preparation

Reinforcement learning

Further improvements

Reinforcement Learning in Robotics

Reinforcement Learning in Robotics

Reinforcement learning in robotics

Challenges in robot reinforcement learning

Open questions and practical challenges

Deep Reinforcement Learning in Ad Tech

Deep Reinforcement Learning in Ad Tech

Computational advertising challenges and bidding strategies

Real-time bidding by reinforcement learning in display advertising

Reinforcement Learning in Image Processing

Reinforcement Learning in Image Processing

Hierarchical object detection with deep reinforcement learning

Deep Reinforcement Learning in NLP

Deep Reinforcement Learning in NLP

Text summarization

Text question answering

Further topics in Reinforcement Learning

Further topics in Reinforcement Learning

Continuous action space algorithms

Scoring mechanism in sequential models in NLP

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

The SARSA algorithm

The State–Action–Reward–State–Action (SARSA) algorithm is an on-policy learning problem. Just like Q-learning, SARSA is also a temporal difference learning problem, that is, it looks ahead at the next step in the episode to estimate future rewards. The major difference between SARSA and Q-learning is that the action having the maximum Q-value is not used to update the Q-value of the current state-action pair. Instead, the Q-value of the action as the result of the current policy, or owing to the exploration step like

-greedy is chosen to update the Q-value of the current state-action pair. The name SARSA comes from the fact that the Q-value update is done by using a quintuple Q(s,a,r,s',a') where:

s,a: current state and action
r: reward observed post taking action a
s': next state reached after taking action a
a': action to be performed at state s'

Steps involved in the SARSA algorithm are as follows:

Initialize Q-table randomly
For each episode:
1. For the given state s, choose...