Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Reinforcement Learning with TensorFlow
  • Table Of Contents Toc
Reinforcement Learning with TensorFlow

Reinforcement Learning with TensorFlow

By : Sayon Dutta
2.2 (5)
close
close
Reinforcement Learning with TensorFlow

Reinforcement Learning with TensorFlow

2.2 (5)
By: Sayon Dutta

Overview of this book

Reinforcement learning (RL) allows you to develop smart, quick and self-learning systems in your business surroundings. It's an effective method for training learning agents and solving a variety of problems in Artificial Intelligence - from games, self-driving cars and robots, to enterprise applications such as data center energy saving (cooling data centers) and smart warehousing solutions. The book covers major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You'll also be introduced to the concept of reinforcement learning, its advantages and the reasons why it's gaining so much popularity. You'll explore MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, and temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP. By the end of this book, you will have gained a firm understanding of what reinforcement learning is and understand how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.
Table of Contents (17 chapters)
close
close

Asynchronous n-step Q-learning


The architecture of asynchronous n-step Q-learning is, to an extent, similar to that of asynchronous one-step Q-learning. The difference is that the learning agent actions are selected using the exploration policy for up to

 steps or until a terminal state is reached, in order to compute a single update of policy network parameters. This process lists 

 rewards from the environment since its last update. Then, for each time step, the loss is calculated as the difference between the discounted future rewards at that time step and the estimated Q-value. The gradient of this loss with respect to thread-specific network parameters for each time step is calculated and accumulated. There are multiple such learning agents running and accumulating the gradients in parallel. These accumulated gradients are used to perform asynchronous updates of policy network parameters.

The pseudo-code for asynchronous n-step Q-learning is shown below. Here, the following are the global...

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Reinforcement Learning with TensorFlow
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon