Reinforcement Learning with TensorFlow

Reinforcement Learning with TensorFlow

By : Sayon Dutta

Buy this Book

Reinforcement Learning with TensorFlow

By: Sayon Dutta

Buy this Book

Overview of this book

Reinforcement learning (RL) allows you to develop smart, quick and self-learning systems in your business surroundings. It's an effective method for training learning agents and solving a variety of problems in Artificial Intelligence - from games, self-driving cars and robots, to enterprise applications such as data center energy saving (cooling data centers) and smart warehousing solutions. The book covers major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You'll also be introduced to the concept of reinforcement learning, its advantages and the reasons why it's gaining so much popularity. You'll explore MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, and temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP. By the end of this book, you will have gained a firm understanding of what reinforcement learning is and understand how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Deep Learning – Architectures and Frameworks

Deep learning

Reinforcement learning

Introduction to TensorFlow and OpenAI Gym

The pioneers and breakthroughs in reinforcement learning

Summary

Training Reinforcement Learning Agents Using OpenAI Gym

The OpenAI Gym

Programming an agent using an OpenAI Gym environment

Summary

Markov Decision Process

Markov decision processes

Partially observable Markov decision processes

Training the FrozenLake-v0 environment using MDP

Summary

Policy Gradients

The policy optimization method

Why policy optimization methods?

Policy objective functions

Temporal difference rule

Policy gradients

Agent learning pong using policy gradients

Summary

Q-Learning and Deep Q-Networks

Why reinforcement learning?

Model based learning and model free learning

Q-learning

Deep Q-networks

The Monte Carlo tree search algorithm

The SARSA algorithm

Summary

Asynchronous Methods

Why asynchronous methods?

Asynchronous one-step Q-learning

Asynchronous one-step SARSA

Asynchronous n-step Q-learning

Asynchronous advantage actor critic

A3C for Pong-v0 in OpenAI gym

Summary

Robo Everything – Real Strategy Gaming

Real-time strategy games

Reinforcement learning and other approaches

Reinforcement learning in RTS gaming

Summary

AlphaGo – Reinforcement Learning at Its Best

What is Go?

AlphaGo – mastering Go

AlphaGo Zero

Summary

Reinforcement Learning in Autonomous Driving

Machine learning for autonomous driving

Reinforcement learning for autonomous driving

Proposed frameworks for autonomous driving

DeepTraffic – MIT simulator for autonomous driving

Summary

Financial Portfolio Management

Introduction

Problem definition

Data preparation

Reinforcement learning

Further improvements

Summary

Reinforcement Learning in Robotics

Reinforcement learning in robotics

Challenges in robot reinforcement learning

Open questions and practical challenges

Key takeaways

Summary

Deep Reinforcement Learning in Ad Tech

Computational advertising challenges and bidding strategies

Real-time bidding by reinforcement learning in display advertising

Summary

Reinforcement Learning in Image Processing

Hierarchical object detection with deep reinforcement learning

Summary

Deep Reinforcement Learning in NLP

Text summarization

Text question answering

Summary

Further topics in Reinforcement Learning

Continuous action space algorithms

Scoring mechanism in sequential models in NLP

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction to TensorFlow and OpenAI Gym

TensorFlow is the mathematical library created by the team of Google Brain at Google. Thanks to its dataflow programming, it's being heaving used as a deep learning library both in research and development sectors. Since its inception in 2015, TensorFlow has grown a very big community.

OpenAI Gym is a reinforcement learning playground created by the team at OpenAI with an aim to provide a simple interface, since creating an environment is itself a tedious task in reinforcement learning. It provides a good list of environments to test your reinforcement learning algorithms in so that you can benchmark them.

Basic computations in TensorFlow

The base of TensorFlow is the computational graph, which we discussed earlier in this chapter, and tensors. A tensor is an n-dimensional vector. Thus, a scalar and a matrix variable is also a tensor. Here, we will try some of the basic computations to start with TensorFlow. Please try to implement this section in a python IDE such as Jupyter Notebook.

For the TensorFlow installation and dependencies please refer to the following link:

https://www.tensorflow.org/install/

Import tensorflow by the following command:

import tensorflow as tf

tf.zeros() and tf.ones() are some of the functions that instantiate basic tensors. The tf.zeros() takes a tensor shape (that is, a tuple) and returns a tensor of that shape with all the values being zero. Similarly, tf.ones() takes a tensor shape but returns a tensor of that shape containing only ones. Try the following commands in python shell to create a tensor:

>>> tf.zeros(3)

<tf.Tensor 'zeros:0' shape=(3,) dtype=float32>

>>>tf.ones(3)

<tf.Tensor 'ones:0' shape=(3,) dtype=float32>

As you can see, TensorFlow returns a reference to the tensor and not the value of the tensor. In order to get the value, we can use eval() or run(), a function of tensor objects by running a session as follows:

>>> a = tf.zeros(3)
>>> with tf.Session() as sess:
        sess.run(a)
        a.eval()

array([0., 0.,0.], dtype=float32)

array([0., 0.,0.], dtype=float32)

Next come the tf.fill() and tf.constant() methods to create a tensor of a certain shape and value:

>>> a = tf.fill((2,2),value=4.)
>>> b = tf.constant(4.,shape=(2,2))
>>> with tf.Session() as sess:
        sess.run(a)
        sess.run(b)

array([[ 4., 4.],
[ 4., 4.]], dtype=float32)

array([[ 4., 4.],
[ 4., 4.]], dtype=float32)

Next, we have functions that can randomly initialize a tensor. Among them, the most frequently used ones are:

tf.random_normal: Samples random values from the Normal distribution of specified mean and standard deviation
tf.random_uniform(): Samples random values from the Uniform distribution of a specified range

>>> a = tf.random_normal((2,2),mean=0,stddev=1)
>>> b = tf.random_uniform((2,2),minval=-3,maxval=3)
>>> with tf.Session() as sess:
        sess.run(a)
        sess.run(b)

array([[-0.31790468, 1.30740941],
[-0.52323157, -0.2980336 ]], dtype=float32)

array([[ 1.38419437, -2.91128755],
[-0.80171156, -0.84285879]], dtype=float32)

Variables in TensorFlow are holders for tensors and are defined by the function tf.Variable():

>>> a = tf.Variable(tf.ones((2,2)))
>>> a

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref>

The evaluation fails in case of variables because they have to be explicitly initialized by using tf.global_variables_initializer within a session:

>>> a = tf.Variable(tf.ones((2,2)))
>>> with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        a.eval()

array([[ 1., 1.],
[ 1., 1.]], dtype=float32)

Next in the queue, we have matrices. Identity matrices are square matrices with ones in the diagonal and zeros elsewhere. This can be done with thefunction tf.eye():

>>> id = tf.eye(4) #size of the square matrix = 4
>>> with tf.Session() as sess:
         sess.run(id)

array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]], dtype=float32)

Similarly, there are diagonal matrices, which have values in the diagonal and zeros elsewhere, as shown here:

>>> a = tf.range(1,5,1)
>>> md = tf.diag(a)
>>> mdn = tf.diag([1,2,5,3,2])
>>> with tf.Session() as sess:
        sess.run(md)
        sess.run(mdn)

array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]], dtype=int32)

array([[1, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 2]], dtype=int32)

We use the tf.matrix_transpose() function to transpose the given matrix, as shown here:

>>> a = tf.ones((2,3))
>>> b = tf.transpose(a)
>>> with tf.Session() as sess:
        sess.run(a)
        sess.run(b)

array([[ 1., 1., 1.],
[ 1., 1., 1.]], dtype=float32)

array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]], dtype=float32)

The next matrix operation is the matrix multiplication function as shown here. This is done by the function tf.matmul():

>>> a = tf.ones((3,2))
>>> b = tf.ones((2,4))
>>> c = tf.matmul(a,b)
>>> with tf.Session() as sess:
        sess.run(a)
        sess.run(b)
        sess.run(c)

array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]], dtype=float32)

array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]], dtype=float32)

array([[ 2., 2., 2., 2.],
[ 2., 2., 2., 2.],
[ 2., 2., 2., 2.]], dtype=float32)

Reshaping of tensors from one to another is done by using the tf.reshape() function, as shown here:

>>> a = tf.ones((2,4)) #initial shape is (2,4)
>>> b = tf.reshape(a,(8,)) # reshaping it to a vector of size 8. Thus shape is (8,)
>>> c = tf.reshape(a,(2,2,2)) #reshaping tensor a to shape (2,2,2)
>>> d = tf.reshape(b,(2,2,2)) #reshaping tensor b to shape (2,2,2) 
#####Thus, tensor 'c' and 'd' will be similar
>>> with tf.Session() as sess:
        sess.run(a)
        sess.run(b)
        sess.run(c)
        sess.run(d)

array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]], dtype=float32)

array([ 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

array([[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 1.]]], dtype=float32)
&gt;
array([[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 1.]]], dtype=float32)

The flow of computation in TensorFlow is represented as a computational graph, which is as instance of tf.Graph. The graph contains tensors and operation objects, and keeps track of a series of operations and tensors involved. The default instance of the graph can be fetched by tf.get_default_graph():

>>> tf.get_default_graph()

<tensorflow.python.framework.ops.Graph object at 0x7fa3e139b550>

We will explore complex operations, the creation of neural networks, and much more in TensorFlow in the coming chapters.

An introduction to OpenAI Gym

The OpenAI Gym, created by the team at OpenAI is a playground of different environments where you can develop and compare your reinforcement learning algorithms. It is compatible with deep learning libraries such as TensorFlow and Theano.

OpenAI Gym consists of two parts:

The gym open-source library: This consists of many environments for different test problems where you can test your reinforcement learning algorithms. This suffices with the information of state and action spaces.
The OpenAI Gym service: This allows you to compare the performance of your agent with other trained agents.

For the installation and dependencies, please refer to the following link:

https://gym.openai.com/docs/

With the basics covered, now we can start with the implementation of reinforcement learning using the OpenAI Gym from next Chapter 2, Training Reinforcement Learning Agents using OpenAI Gym.

Reinforcement Learning with TensorFlow

By : Sayon Dutta

Reinforcement Learning with TensorFlow

By: Sayon Dutta

Overview of this book

Related Content you might be interested in

Current Title:

Reinforcement Learning with TensorFlow

Hands-On Reinforcement Learning with Python

Python Reinforcement Learning Projects

Hands-On Intelligent Agents with OpenAI Gym

Introduction to TensorFlow and OpenAI Gym

Basic computations in TensorFlow

An introduction to OpenAI Gym