Book Image

Keras Deep Learning Cookbook

By : Rajdeep Dua, Sujit Pal, Manpreet Singh Ghotra
Book Image

Keras Deep Learning Cookbook

By: Rajdeep Dua, Sujit Pal, Manpreet Singh Ghotra

Overview of this book

Keras has quickly emerged as a popular deep learning library. Written in Python, it allows you to train convolutional as well as recurrent neural networks with speed and accuracy. The Keras Deep Learning Cookbook shows you how to tackle different problems encountered while training efficient deep learning models, with the help of the popular Keras library. Starting with installing and setting up Keras, the book demonstrates how you can perform deep learning with Keras in the TensorFlow. From loading data to fitting and evaluating your model for optimal performance, you will work through a step-by-step process to tackle every possible problem faced while training deep models. You will implement convolutional and recurrent neural networks, adversarial networks, and more with the help of this handy guide. In addition to this, you will learn how to train these models for real-world image and language processing tasks. By the end of this book, you will have a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Optimization with Adam


SGD, in contrast to batch gradient descent, performs a parameter update for each training example, x(i) and label y(i):

Θ = Θ - η∇Θj(Θ, x(i), y(i))

Adaptive Moment Estimation (Adam) computes adaptive learning rates for each parameter. Like AdaDelta, Adam not only stores the decaying average of past squared gradients but additionally stores the momentum change for each parameter. Adam works well in practice and is one of the most used optimization methods today.

Adam stores the exponentially decaying average of past gradients (mt) in addition to the decaying average of past squared gradients (like Adadelta and RMSprop). Adam behaves like a heavy ball with friction running down the slope leading to a flat minima in the error surface. Decaying averages of past and past squared gradients mt and vt are computed with the following formulas:

mt1mt−1+(1−β1)gt

vt2vt−1+(1−β2)gt

mt and vt are estimates of the first moment (the mean) and the second moment (the uncentered variance...