Book Image

Reinforcement Learning with TensorFlow

By : Sayon Dutta
Book Image

Reinforcement Learning with TensorFlow

By: Sayon Dutta

Overview of this book

Reinforcement learning (RL) allows you to develop smart, quick and self-learning systems in your business surroundings. It's an effective method for training learning agents and solving a variety of problems in Artificial Intelligence - from games, self-driving cars and robots, to enterprise applications such as data center energy saving (cooling data centers) and smart warehousing solutions. The book covers major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You'll also be introduced to the concept of reinforcement learning, its advantages and the reasons why it's gaining so much popularity. You'll explore MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, and temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP. By the end of this book, you will have gained a firm understanding of what reinforcement learning is and understand how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.
Table of Contents (21 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Preface

Reinforcement learning (RL) allows you to develop smart, quick, and self-learning systems in your business surroundings. It is an effective method to train your learning agents and solve a variety of problems in artificial intelligence—from games, self-driving cars, and robots to enterprise applications that range from data center energy saving (cooling data centers) to smart warehousing solutions.

The book covers the major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. The book also introduces readers to the concept of Reinforcement Learning, its advantages and why it's gaining so much popularity. It discusses MDPs, Monte Carlo tree searches, policy and value iteration, temporal difference learning such as Q-learning, and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing, and NLP.By the end of this book, you will have a firm understanding of what reinforcement learning is and how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

Who this book is for

If you want to get started with reinforcement learning using TensorFlow in the most practical way, this book will be a useful resource. The book assumes prior knowledge of traditional machine learning and linear algebra, as well as some understanding of the TensorFlow framework. No previous experience of reinforcement learning and deep neural networks is required.

What this book covers

Chapter 1, Deep Reinforcement – Architectures and Frameworks, covers the relevant and common deep learning architectures, basics of logistic regression, neural networks, RNN, LSTMs, and CNNs. We also cover an overview of reinforcement learning, the various technologies, frameworks, tools, and techniques, along with what has been achieved so far, the future, and various interesting applications.

 Chapter 2, Training Reinforcement Learning Agents Using OpenAI Gym, explains that OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games such as Pong or Breakout. In this chapter, we learn how to use the OpenAI Gym framework to program interesting RL applications.

Chapter 3Markov Decision Process, discusses the fundamental concepts behind reinforcement learning such as MDP, Bellman Value functions, POMDP, concepts of value iteration, reward's sequence, and training a reinforcement learning agent using value iteration in an MDP environment from OpenAI Gym.

Chapter 4, Policy Gradients, shows a way of implementing reinforcement learning systems by directly deriving the policies. Policy gradients are faster and can work in continuous state-action spaces. We cover the basics of policy gradient such as policy objective functions, temporal difference rule, policy gradients, and actor-critic algorithms. We learn to apply a policy gradient algorithm to train an agent to play the game of Pong.

Chapter 5, Q-Learning and Deep Q-Networks, explains that algorithms such as State-Action-Reward-State-Action (SARSA), MCTS, and DQN have enabled a new era of RL, including AlphaGo. In this chapter, we take a look at the building blocks of Q-Learning and applying deep neural networks (such as CNNs) to create DQN. We also implement SARSA, Q-learning, and DQN to create agents to play the games of Mountain Car, Cartpole, and Atari Breakout.

Chapter 6, Asynchronous Methods, teaches asynchronous methods: asynchronous one-step Q-learning, asynchronous one-step SARSA, asynchronous n-step Q-learning, and asynchronous advantage actor-critic (A3C). A3C is a state-of-the-art deep reinforcement learning framework. We also implement A3C to create a reinforcement learning agent.

Chapter 7, Robo Everything – Real Strategy Gaming, brings together the RL foundations, technologies, and frameworks together to develop RL pipelines and systems. We will also discuss the system-level strategies to make reinforcement learning problems easier to solve (shaping, curriculum learning, apprenticeship learning, building blocks, and multiconcepts).

Chapter 8, AlphaGo – Reinforcement Learning at Its Best, covers one of the most successful stories: the success of AI in playing and winning the game of Go against the world champion. In this chapter, we look at the algorithms, architectures, pipelines, hardware, training methodologies, and game strategies employed by AlphaGo. 

Chapter 9, Reinforcement Learning in Autonomous Driving, illustrates one of the most interesting applications of RL, that is, autonomous driving. There are many use cases such as multi-lane merging and driving policies for negotiating roundabouts. We cover the challenges in autonomous driving and discuss proposed research-based solutions. We also introduce the famous MIT Deep Traffic simulator to test our reinforcement learning framework.Chapter 10Financial Portfolio Management, covers the application of RL techniques in the financial world. Many predict that AI will be the norm in asset management, trading desks, and portfolio management.Chapter 11Reinforcement Learning in Robotics, shows another interesting domain in which RL has found a lot of applications—robotics. The challenges of implementing RL in robotics and the probable solutions are covered.

Chapter 12Deep Reinforcement Learning in Ad Tech, covers topics such as computational advertising challenges, bidding strategies, and real-time bidding by reinforcement learning in display advertising.

Chapter 13Reinforcement Learning in Image Processing, is about the most famous domain in computer vision—object detection—and how reinforcement learning is trying to solve it.

Chapter 14Deep Reinforcement Learning in NLP , illustrates the use of reinforcement learning in text summarization and question answering, which will give you a basic idea of how researchers are reaping the benefits of reinforcement learning in these domains.Appendix A, Further topics in Reinforcement Learning, has an introductory overview of some of the topics that were out of the scope of this book. But we mention them in brief and end these topics with external links for you to explore them further.

To get the most out of this book

The following are the requirements to get the most out of this book:

  • Python and TensorFlow
  • Linear algebra as a prerequisite for neural networks
  • Installation bundle: Python, TensorFlow, and OpenAI gym (shown in Chapter 1Deep Learning – Architectures and Frameworks and Chapter 2, Training Reinforcement Learning Agents Using OpenAI Gym)

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Reinforcement-Learning-with-TensorFlow. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/ReinforcementLearningwithTensorFlow_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The sigmoid(x) and relu(x) refer to the functions performing sigmoid and ReLU activation calculations respectively."

A block of code is set as follows:

def discretization(env, obs):

    env_low = env.observation_space.low
    env_high = env.observation_space.high

Any command-line input or output is written as follows:

Episode 1 completed with total reward 8433.30289388 in 26839 steps
Episode 2 completed with total reward 3072.93369963 in 8811 steps
Episode 3 completed with total reward 1230.81734028 in 4395 steps
Episode 4 completed with total reward 2182.31111239 in 6629 steps

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.