Case Study – The MAB Problem | Deep Reinforcement Learning with Python

Book Overview & Buying
Table Of Contents

Deep Reinforcement Learning with Python - Second Edition

By : Sudharsan Ravichandiran

4.4 (20)

Buy this Book

Deep Reinforcement Learning with Python

4.4 (20)

By: Sudharsan Ravichandiran

Buy this Book

Overview of this book

With significant enhancements in the quality and quantity of algorithms in recent years, this second edition of Hands-On Reinforcement Learning with Python has been revamped into an example-rich guide to learning state-of-the-art reinforcement learning (RL) and deep RL algorithms with TensorFlow 2 and the OpenAI Gym toolkit. In addition to exploring RL basics and foundational concepts such as Bellman equation, Markov decision processes, and dynamic programming algorithms, this second edition dives deep into the full spectrum of value-based, policy-based, and actor-critic RL methods. It explores state-of-the-art algorithms such as DQN, TRPO, PPO and ACKTR, DDPG, TD3, and SAC in depth, demystifying the underlying math and demonstrating implementations through simple code examples. The book has several new chapters dedicated to new RL techniques, including distributional RL, imitation learning, inverse RL, and meta RL. You will learn to leverage stable baselines, an improvement of OpenAI’s baseline library, to effortlessly implement popular RL algorithms. The book concludes with an overview of promising approaches such as meta-learning and imagination augmented agents in research. By the end, you will become skilled in effectively employing RL and deep RL in your real-world projects.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Fundamentals of Reinforcement Learning

Key elements of RL

The basic idea of RL

The RL algorithm

How RL differs from other ML paradigms

Markov Decision Processes

Fundamental concepts of RL

Applications of RL

RL glossary

Summary

Questions

Further reading

Free Chapter

A Guide to the Gym Toolkit

Setting up our machine

Creating our first Gym environment

More Gym environments

Environment synopsis

Summary

Questions

Further reading

The Bellman Equation and Dynamic Programming

The Bellman equation

Dynamic programming

Is DP applicable to all environments?

Summary

Questions

Monte Carlo Methods

Understanding the Monte Carlo method

Prediction and control tasks

Monte Carlo prediction

Monte Carlo control

Is the MC method applicable to all tasks?

Summary

Questions

Understanding Temporal Difference Learning

TD learning

TD prediction

TD control

Comparing the DP, MC, and TD methods

Summary

Questions

Further reading

Case Study – The MAB Problem

The MAB problem

Applications of MAB

Finding the best advertisement banner using bandits

Contextual bandits

Summary

Questions

Further reading

Deep Learning Foundations

Biological and artificial neurons

ANN and its layers

Exploring activation functions

Forward propagation in ANNs

How does an ANN learn?

Putting it all together

Recurrent Neural Networks

LSTM to the rescue

What are CNNs?

The architecture of CNNs

Generative adversarial networks

Total loss

Summary

Questions

Further reading

A Primer on TensorFlow

What is TensorFlow?

Understanding computational graphs and sessions

Variables, constants, and placeholders

Introducing TensorBoard

Handwritten digit classification using TensorFlow

Introducing eager execution

Math operations in TensorFlow

TensorFlow 2.0 and Keras

Summary

Questions

Further reading

Deep Q Network and Its Variants

What is DQN?

Playing Atari games using DQN

The double DQN

DQN with prioritized experience replay

The dueling DQN

The deep recurrent Q network

Summary

Questions

Further reading

Policy Gradient Method

Why policy-based methods?

Policy gradient intuition

Variance reduction methods

Summary

Questions

Further reading

Actor-Critic Methods – A2C and A3C

Overview of the actor-critic method

Advantage actor-critic (A2C)

Asynchronous advantage actor-critic (A3C)

A2C revisited

Summary

Questions

Further reading

Learning DDPG, TD3, and SAC

Deep deterministic policy gradient

Twin delayed DDPG

Soft actor-critic

Summary

Questions

Further reading

TRPO, PPO, and ACKTR Methods

Trust region policy optimization

Proximal policy optimization

Actor-critic using Kronecker-factored trust region

Summary

Questions

Further reading

Distributional Reinforcement Learning

Why distributional reinforcement learning?

Categorical DQN

Quantile Regression DQN

Distributed Distributional DDPG

Summary

Questions

Further reading

Imitation Learning and Inverse RL

Supervised imitation learning

DAgger

Deep Q learning from demonstrations

Inverse reinforcement learning

Generative adversarial imitation learning

Summary

Questions

Further reading

Deep Reinforcement Learning with Stable Baselines

Installing Stable Baselines

Creating our first agent with Stable Baselines

Vectorized environments

Integrating custom environments

Playing Atari games with a DQN and its variants

Lunar lander using A2C

Swinging up a pendulum using DDPG

Training an agent to walk using TRPO

Training a cheetah bot to run using PPO

Implementing GAIL

Summary

Questions

Further reading

Reinforcement Learning Frontiers

Meta reinforcement learning

Hierarchical reinforcement learning

Imagination augmented agents

Summary

Questions

Further reading

Other Books You May Enjoy

Index

Appendix 1 – Reinforcement Learning Algorithms

Reinforcement learning algorithm

Value Iteration

Policy Iteration

First-Visit MC Prediction

Every-Visit MC Prediction

MC Prediction – the Q Function

MC Control Method

On-Policy MC Control – Exploring starts

On-Policy MC Control – Epsilon-Greedy

Off-Policy MC Control

TD Prediction

On-Policy TD Control – SARSA

Off-Policy TD Control – Q Learning

Deep Q Learning

Double DQN

REINFORCE Policy Gradient

Policy Gradient with Reward-To-Go

REINFORCE with Baseline

Advantage Actor Critic

Asynchronous Advantage Actor-Critic

Deep Deterministic Policy Gradient

Twin Delayed DDPG

Soft Actor-Critic

Trust Region Policy Optimization

PPO-Clipped

PPO-Penalty

Categorical DQN

Distributed Distributional DDPG

DAgger

Deep Q learning from demonstrations

MaxEnt Inverse Reinforcement Learning

MAML in Reinforcement Learning

Appendix 2 – Assessments

Chapter 1 – Fundamentals of Reinforcement Learning

Chapter 2 – A Guide to the Gym Toolkit

Chapter 3 – The Bellman Equation and Dynamic Programming

Chapter 4 – Monte Carlo Methods

Chapter 5 – Understanding Temporal Difference Learning

Chapter 6 – Case Study – The MAB Problem

Chapter 7 – Deep Learning Foundations

Chapter 8 – A Primer on TensorFlow

Chapter 9 – Deep Q Network and Its Variants

Chapter 10 – Policy Gradient Method

Chapter 11 – Actor-Critic Methods – A2C and A3C

Chapter 12 – Learning DDPG, TD3, and SAC

Chapter 13 – TRPO, PPO, and ACKTR Methods

Chapter 14 – Distributional Reinforcement Learning

Chapter 15 – Imitation Learning and Inverse RL

Chapter 16 – Deep Reinforcement Learning with Stable Baselines

Chapter 17 – Reinforcement Learning Frontiers

Deep Reinforcement Learning with Python - Second Edition

By : Sudharsan Ravichandiran

Deep Reinforcement Learning with Python

By: Sudharsan Ravichandiran

Overview of this book

Summary

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access