# What this book covers

*Chapter 1**, Introduction to Reinforcement Learning*, provides an introduction to RL, presents motivating examples and success stories, and looks at RL applications in industry. It then gives some fundamental definitions to refresh your mind on RL concepts and concludes with a section on software and hardware setup.

*Chapter 2**, Multi-Armed Bandits*, covers a rather simple RL setting, bandit problems without context, which, on the other hand, has tremendous applications in industry as an alternative to the traditional A/B testing. The chapter also describes a very fundamental RL trade-off: exploration versus exploitation. It then presents three approaches to tackle this trade-off and compares them against A/B testing.

*Chapter 3**, Contextual Bandits*, takes the discussion on multi-armed bandits to an advanced level by adding context to the decision-making process and involving deep neural networks in decision making. We adapt a real dataset from the U.S. Census to an online advertising problem. We conclude the chapter with a section on the applications of bandit problems in industry and business.

*Chapter 4**, Makings of a Markov Decision Process*, builds the mathematical theory behind sequential decision processes that are solved using RL. We start with Markov chains, where we describe types of states, ergodicity, transitionary, and steady-state behavior. Then we go into Markov reward and decision processes. Along the way, we introduce return, discount, policy, value functions, and Bellman optimality, which are key concepts in RL theory that will be frequently referred to in later chapters. We conclude the chapter with a discussion on partially observed Markov decision processes. Throughout the chapter, we use a grid world example to illustrate the concepts.

*Chapter 5**, Solving the Reinforcement Learning Problem*, presents and compares dynamic programming, Monte Carlo, and temporal-difference methods, which are fundamental to understanding how to solve a Markov decision process. Key approaches such as policy evaluation, policy iteration, and value iteration are introduced and illustrated. Throughout the chapter, we solve an example inventory replenishment problem. Along the way, we motivate the reader for deep RL methods. We conclude the chapter with a discussion on the importance of simulation in reinforcement learning.

*Chapter 6**, Deep Q-Learning at Scale*, starts with a discussion on why it is challenging to use deep neural networks in reinforcement learning and how modern deep Q-learning addresses those challenges. After a thorough coverage of scalable deep Q-learning methods, we introduce Ray, a distributed computing framework, with which we implement a parallelized deep Q-learning variant. We finish the chapter by introducing RLlib, Ray's own scalable RL library.

*Chapter 7**, Policy-Based Methods*, introduces another important class of RL approaches: policy-based methods. You will first learn how they are different than Q-learning and why they are needed. As we build the theory for contemporary policy-based methods, we also show how you can use RLlib for their application to a sample problem.

*Chapter 8**, Model-Based Methods*, presents how learning a model of the environment can help an RL agent to plan its actions efficiently. In the chapter, we implement and use variants of cross-entropy methods and present Dyna, an RL framework that combines model-free and model-based approaches.

*Chapter 9**, Multi-Agent Reinforcement Learning*, increases gears, goes into multi-agent settings and present the challenges that come with it. In the chapter, we train tic-tac-toe agents through self-play, which you also can play against for fun.

*Chapter 10**, Introducing Machine Teaching*, introduces an emerging concept in RL that focuses on leveraging the subject matter expertise of a human "teacher" to make learning easy for RL agents. We present how reward function engineering, curriculum learning, demonstration learning, and action masking can help with training autonomous agents effectively.

*Chapter 11**, Achieving Generalization and Overcoming Partial Observability* discusses why it is important to be concerned about generalization capabilities of trained RL policies for successful real-world implementations. To this end, the chapter focuses on simulation-to-real gap, connects generalization and partial observability, and introduces domain randomization and memory mechanisms. We also present the CoinRun environment and results on how traditional regularization methods can also help with generalization in RL.

*Chapter 12**, Meta-Reinforcement Learning*, introduces approaches that allow an RL agent to adapt to a new environment once it is deployed for its task. This is one of the most important research directions towards achieving resilient autonomy through RL.

*Chapter 13**, Exploring Advanced Topics*, brings you up to speed with some of the most recent developments in RL, including state-of-the-art distributed RL, SEED RL, approaches that cracked all the Atari benchmarks, Agent57, and RL without simulation, offline RL.

*Chapter 14**, Solving Robot Learning*, goes into implementations of the methods covered in the earlier chapters by training a robot hand to grasp objects using manual and automated curriculum learning in PyBullet, a famous physics simulation in Python.

*Chapter 15**, Supply Chain Management*, gives you hands-on experience in modeling and solving an inventory replenishment problem. Along the way, we perform hyperparameter tuning for our RL agent. The chapter concludes with a discussion on how RL can be applied to vehicle routing problems.

*Chapter 16**, Personalization, Marketing, and Finance* goes beyond bandit models for personalization and discusses a news recommendation problem while introducing dueling bandit gradient descent and action embeddings along the way. The chapter also discusses marketing and finance applications of RL and introduces the TensorTrade library for the latter.

*Chapter 17**, Smart City and Cybersecurity* starts with solving a traffic light contsrol scenario as a multi-agent RL problem using the Flow framework. It then describes how RL can be applied to two other problems: providing ancillary service to a power grid and discovering cyberattacks in it.

*Chapter 18**, Challenges and Future Directions in Reinforcement Learning* wraps up the book by recapping the challenges in RL and connects them to the recent developments and research in the field. Finally, we present practical suggestions for the reader who want to further deepen their RL expertise.