#### Overview of this book

With the surge in artificial intelligence in applications catering to both business and consumer needs, deep learning is more important than ever for meeting current and future market demands. With this book, you’ll explore deep learning, and learn how to put machine learning to use in your projects. This second edition of Python Deep Learning will get you up to speed with deep learning, deep neural networks, and how to train them with high-performance algorithms and popular Python frameworks. You’ll uncover different neural network architectures, such as convolutional networks, recurrent neural networks, long short-term memory (LSTM) networks, and capsule networks. You’ll also learn how to solve problems in the fields of computer vision, natural language processing (NLP), and speech recognition. You'll study generative model approaches such as variational autoencoders and Generative Adversarial Networks (GANs) to generate images. As you delve into newly evolved areas of reinforcement learning, you’ll gain an understanding of state-of-the-art algorithms that are the main components behind popular games Go, Atari, and Dota. By the end of the book, you will be well-versed with the theory of deep learning along with its real-world applications.
Preface
Free Chapter
Machine Learning - an Introduction
Neural Networks
Deep Learning Fundamentals
Computer Vision with Convolutional Networks
Generating Images with GANs and VAEs
Recurrent Neural Networks and Language Models
Reinforcement Learning Theory
Deep Reinforcement Learning for Games
Deep Learning in Autonomous Vehicles
Other Books You May Enjoy

# RL as a Markov decision process

A Markov decision process (MDP) is a mathematical framework for modeling decisions. We can use it to describe the RL problem. We'll assume that we work with a full knowledge of the environment. An MDP provides a formal definition of the properties we defined in the previous section (and adds some new ones):

• is the finite set of all possible environment states, and st is the state at time t.
• is the set of all possible actions, and at is the action at time t.
• is the dynamics of the environment (also known as transition probabilities matrix). It defines the conditional probability of transitioning to a new state, s', given the existing state, s, and an action, a (for all states and actions):

We have transition probabilities between the states, because MDP is stochastic (it includes randomness). These probabilities represent the...