Book Image

Python Deep Learning Cookbook

By : Indra den Bakker
Book Image

Python Deep Learning Cookbook

By: Indra den Bakker

Overview of this book

Deep Learning is revolutionizing a wide range of industries. For many applications, deep learning has proven to outperform humans by making faster and more accurate predictions. This book provides a top-down and bottom-up approach to demonstrate deep learning solutions to real-world problems in different areas. These applications include Computer Vision, Natural Language Processing, Time Series, and Robotics. The Python Deep Learning Cookbook presents technical solutions to the issues presented, along with a detailed explanation of the solutions. Furthermore, a discussion on corresponding pros and cons of implementing the proposed solution using one of the popular frameworks like TensorFlow, PyTorch, Keras and CNTK is provided. The book includes recipes that are related to the basic concepts of neural networks. All techniques s, as well as classical networks topologies. The main purpose of this book is to provide Python programmers a detailed list of recipes to apply deep learning to common and not-so-common scenarios.
Table of Contents (21 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Experimenting with different optimizers


The most popular and well optimizer is Stochastic Gradient Descent (SGD). This technique is widely used in other machine learning models as well. SGD is a to find minima or maxima by iteration. There are many popular variants of SGD that try to speed up convergence and less tuning by using an adaptive learning rate. The following table is an overview of the most commonly used optimizers in deep learning:

Optimizer

Hyperparameters

Comments

SGD

Learning rate, decay

+ Learning directly impacts performance (smaller learning rate avoids local minima)

- Requires more manual tuning

- Slow convergence

AdaGrad

Learning rate, epsilon, decay

+ Adaptive learning for all parameters (well suited for sparse data)

- Learning becomes too small and stops learning

AdaDelta

Learning rate, rho, epsilon, decay

+ Faster convergence at start

- Slows near minimum

Adam

Learning rate, beta 1, beta 2, epsilon, decay

+ Adaptive learning rate and momentum for all parameters

RMSprop

Learning rate...