Book Image

Deep Learning Quick Reference

By : Mike Bernico
Book Image

Deep Learning Quick Reference

By: Mike Bernico

Overview of this book

Deep learning has become an essential necessity to enter the world of artificial intelligence. With this book deep learning techniques will become more accessible, practical, and relevant to practicing data scientists. It moves deep learning from academia to the real world through practical examples. You will learn how Tensor Board is used to monitor the training of deep neural networks and solve binary classification problems using deep learning. Readers will then learn to optimize hyperparameters in their deep learning models. The book then takes the readers through the practical implementation of training CNN's, RNN's, and LSTM's with word embeddings and seq2seq models from scratch. Later the book explores advanced topics such as Deep Q Network to solve an autonomous agent problem and how to use two adversarial networks to generate artificial images that appear real. For implementation purposes, we look at popular Python-based deep learning frameworks such as Keras and Tensorflow, Each chapter provides best practices and safe choices to help readers make the right decision while training deep neural networks. By the end of this book, you will be able to solve real-world problems quickly with deep neural networks.
Table of Contents (15 chapters)

Optimization algorithms for deep learning

The gradient descent algorithm is not the only optimization algorithm available to optimize our network weights, however it's the basis for most other algorithms. While understanding every optimization algorithm out there is likely a PhD worth of material, we will devote a few sentences to some of the most practical.

Using momentum with gradient descent

Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.

Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:

Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.

The RMSProp algorithm

RMSProp is another algorithm that can speed up gradient descent by speeding up learning in some directions, and dampening oscillations in other directions, across the multidimensional space that the network weights represent:

This has the effect of reducing oscillations more in directions where is large.

The Adam optimizer

Adam is one of the best performing known optimizer and it's my first choice. It works well across a wide variety of problems. It combines the best parts of both momentum and RMSProp into a single update rule:

Where is some very small number to prevent division by 0.

Adam is often a great choice, and it's a great place to start when you're prototyping, so save yourself some time by starting with Adam.