Book Image

PyTorch 1.x Reinforcement Learning Cookbook

By : Yuxi (Hayden) Liu
Book Image

PyTorch 1.x Reinforcement Learning Cookbook

By: Yuxi (Hayden) Liu

Overview of this book

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use. With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game. By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.
Table of Contents (11 chapters)

Setting up the working environment

Let's get started with setting up the working environment, including the correct versions of Python and Anaconda, and PyTorch as the main framework that is used throughout the book.

Python is the language we use to implement all reinforcement learning algorithms and techniques throughout the book. In this book, we will be using Python 3, or more specifically, 3.6 or above. If you are a Python 2 user, now is the best time for you to switch to Python 3, as Python 2 will no longer be supported after 2020. The transition is very smooth, though, so don't panic.

Anaconda is an open source Python distribution ( for data science and machine learning. We will be using Anaconda's package manager, conda, to install Python packages, along with pip.

PyTorch (, primarily developed by the Facebook AI Research (FAIR) Group, is a trendy machine learning library based on Torch ( Tensors in PyTorch replace NumPy's ndarrays, which provides more flexibility and compatibility with GPUs. Because of the powerful computational graphs and the simple and friendly interface, the PyTorch community is expanding on a daily basis, and it has seen heavy adoption by more and more tech giants.

Let's see how to properly set up all of these components.

How to do it...

We will begin by installing Anaconda. You can skip this if you already have Anaconda for Python 3.6 or 3.7 running on your system. Otherwise, you can follow the instructions at for your operating system, as follows:

Feel free to play around with PyTorch once the setup is done. To verify that you have the right setup of Anaconda and Python, you can enter the following line in your Terminal in Linux/Mac or Command Prompt in Windows (from now on, we will just call it Terminal):


It will display your Python Anaconda environment. You should see something similar to the following screenshot:

If Anaconda and Python 3.x are not mentioned, please check the system path or the path Python is running from.

The next thing to do is to install PyTorch. First, go to and pick the description of your environment from the following table:

Here, we use Mac, Conda, Python 3.7, and running locally (no CUDA) as an example, and enter the resulting command line in the Terminal:

conda install pytorch torchvision -c pytorch

To confirm PyTorch is installed correctly, run the following lines of code in Python:

>>> import torch
>>> x = torch.empty(3, 4)
>>> print(x)
tensor([[ 0.0000e+00, 2.0000e+00, -1.2750e+16, -2.0005e+00],
[ 9.8742e-37, 1.4013e-45, 9.9222e-37, 1.4013e-45],
[ 9.9220e-37, 1.4013e-45, 9.9225e-37, 2.7551e-40]])

If a 3 x 4 matrix is displayed, that means PyTorch is installed correctly.

Now we have successfully set up the working environment.

How it works...

We have just created a tensor of size 3 x 4 in PyTorch. It is an empty matrix. By saying empty, this doesn't mean all elements are of the value Null. Instead, they are a bunch of meaningless floats that are considered placeholders. Users are required to set all the values later. This is very similar to NumPy's empty array.

There's more...

Some of you may question the necessity of installing Anaconda and using conda to manage packages since it is easy to install packages with pip. In fact, conda is a better packaging tool than pip. We mainly use conda for the following four reasons:

  • It handles library dependencies nicely: Installing a package with conda will automatically download all of its dependencies. However, doing so with pip will lead to a warning, and installation will be aborted.
  • It solves conflicts of packages gracefully: If installing a package requires another package of a specific version (let's say 2.3 or after, for example), conda will update the version of the other package automatically.
  • It creates a virtual environment easily: A virtual environment is a self-contained package directory tree. Different applications or projects can use different virtual environments. All virtual environments are isolated from each other. It is recommended to use virtual environments so that whatever we do for one application doesn't affect our system environment or any other environment.
  • It is also compatible with pip: We can still use pip in conda with the following command:
conda install pip

See also

If you are interested in learning more about conda, feel free to check out the following resources:

If you want to get more familiar with PyTorch, you can go through the Getting Started section in the official tutorial at We recommend you at least finish the following: