Book Image

Keras Deep Learning Cookbook

By : Rajdeep Dua, Sujit Pal, Manpreet Singh Ghotra
Book Image

Keras Deep Learning Cookbook

By: Rajdeep Dua, Sujit Pal, Manpreet Singh Ghotra

Overview of this book

Keras has quickly emerged as a popular deep learning library. Written in Python, it allows you to train convolutional as well as recurrent neural networks with speed and accuracy. The Keras Deep Learning Cookbook shows you how to tackle different problems encountered while training efficient deep learning models, with the help of the popular Keras library. Starting with installing and setting up Keras, the book demonstrates how you can perform deep learning with Keras in the TensorFlow. From loading data to fitting and evaluating your model for optimal performance, you will work through a step-by-step process to tackle every possible problem faced while training deep models. You will implement convolutional and recurrent neural networks, adversarial networks, and more with the help of this handy guide. In addition to this, you will learn how to train these models for real-world image and language processing tasks. By the end of this book, you will have a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell

Dueling DQN to play Cartpole 

In this section, we will look at a modification of the original DQN network, called the Dueling DQN network, the network architecture. It explicitly separates the representation of state values and (state-dependent) action advantages. The dueling architecture consists of two streams that represent the value and advantage functions while sharing a common convolutional feature learning module.

The two streams are combined via an aggregating layer to produce an estimate of the state-action value function Q, as shown in the following diagram:

A single stream Q network (top) and the dueling Q network (bottom).

The dueling network has two streams to separately estimate the (scalar) state value (referred to as V(...)) and the advantages (referred to as A(...)) for each action; the green output module implements the following equation to combine them. Both networks output Q values for each action.

Instead of defining Q, we will be using the simple following equation:

A term...