Book Image

Keras Deep Learning Cookbook

By : Rajdeep Dua, Sujit Pal, Manpreet Singh Ghotra
Book Image

Keras Deep Learning Cookbook

By: Rajdeep Dua, Sujit Pal, Manpreet Singh Ghotra

Overview of this book

Keras has quickly emerged as a popular deep learning library. Written in Python, it allows you to train convolutional as well as recurrent neural networks with speed and accuracy. The Keras Deep Learning Cookbook shows you how to tackle different problems encountered while training efficient deep learning models, with the help of the popular Keras library. Starting with installing and setting up Keras, the book demonstrates how you can perform deep learning with Keras in the TensorFlow. From loading data to fitting and evaluating your model for optimal performance, you will work through a step-by-step process to tackle every possible problem faced while training deep models. You will implement convolutional and recurrent neural networks, adversarial networks, and more with the help of this handy guide. In addition to this, you will learn how to train these models for real-world image and language processing tasks. By the end of this book, you will have a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell

Optimization with AdaDelta

AdaDelta solves the problem of the decreasing learning rate in AdaGrad. In AdaGrad, the learning rate is computed as 1 divided by the sum of square roots. At each stage, we add another square root to the sum, which causes the denominator to decrease constantly. Now, instead of summing all prior square roots, it uses a sliding window that allows the sum to decrease.



AdaDelta is an extension of AdaGrad that seeks to reduce its aggressive, monotonically decreasing learning rate. Instead of accumulating all past squared gradients, AdaDelta restricts the window of accumulated past gradients to some fixed size, w.

Instead of inefficiently storing w past squared gradients, the sum of the gradients is recursively defined as a decaying average of all past squared gradients. The running average, E[g2]t, at time step t then depends (as a fraction, γ, similar to the momentum term) only on the previous average and the current gradient:

Where E[g2]t is the squared sum of gradients...