Book Image

Reinforcement Learning with TensorFlow

By : Sayon Dutta
Book Image

Reinforcement Learning with TensorFlow

By: Sayon Dutta

Overview of this book

Reinforcement learning (RL) allows you to develop smart, quick and self-learning systems in your business surroundings. It's an effective method for training learning agents and solving a variety of problems in Artificial Intelligence - from games, self-driving cars and robots, to enterprise applications such as data center energy saving (cooling data centers) and smart warehousing solutions. The book covers major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You'll also be introduced to the concept of reinforcement learning, its advantages and the reasons why it's gaining so much popularity. You'll explore MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, and temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP. By the end of this book, you will have gained a firm understanding of what reinforcement learning is and understand how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.
Table of Contents (21 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

AlphaGo Zero


The first generation of AlphaGo was able to beat the professional Go players. In October 2017, Google DeepMind published the paper (https://www.nature.com/articles/nature24270) on AlphaGo Zero in Nature. AlphaGo Zero is the latest version of AlphaGo. Earlier versions of AlphaGo learnt to play the game after being trained on thousands of human games varying from amateur to professional games. But the final version of AlphaGo, that is AlphaGo Zero has learnt everything from scratch, that is from the first basic principle neither using any human data nor any human intervention and was able to achieve the highest level of performance. Thus, AlphaGo Zero learns to play the game of Go by playing against itself. One of the biggest feats was that in 19 hours AlphaGo Zero was able to learn the fundamentals of more advanced Go strategies, which include life and death, influence, and territory. In just three days AlphaGo Zero defeated all the previous versions of AlphaGo, and within 40...