Book Image

The Deep Learning with Keras Workshop

By : Matthew Moocarme, Mahla Abdolahnejad, Ritesh Bhagwat
1 (1)
Book Image

The Deep Learning with Keras Workshop

1 (1)
By: Matthew Moocarme, Mahla Abdolahnejad, Ritesh Bhagwat

Overview of this book

New experiences can be intimidating, but not this one! This beginner’s guide to deep learning is here to help you explore deep learning from scratch with Keras, and be on your way to training your first ever neural networks. What sets Keras apart from other deep learning frameworks is its simplicity. With over two hundred thousand users, Keras has a stronger adoption in industry and the research community than any other deep learning framework. The Deep Learning with Keras Workshop starts by introducing you to the fundamental concepts of machine learning using the scikit-learn package. After learning how to perform the linear transformations that are necessary for building neural networks, you'll build your first neural network with the Keras library. As you advance, you'll learn how to build multi-layer neural networks and recognize when your model is underfitting or overfitting to the training data. With the help of practical exercises, you’ll learn to use cross-validation techniques to evaluate your models and then choose the optimal hyperparameters to fine-tune their performance. Finally, you’ll explore recurrent neural networks and learn how to train them to predict values in sequential data. By the end of this book, you'll have developed the skills you need to confidently train your own neural network models.
Table of Contents (11 chapters)
Preface

Long Short-Term Memory (LSTM)

LSTMs are RNNs whose main objective is to overcome the shortcomings of the vanishing gradient and exploding gradient problems. The architecture is built so that they remember data and information for a long period of time.

LSTMs were designed to overcome the limitation of the vanishing and exploding gradient problems. LSTM networks are a special kind of RNN that are capable of learning long-term dependencies. They are designed to avoid the long-term dependency problem; being able to remember information for long intervals of time is how they are wired. The following diagram displays a standard recurrent network where the repeating module has a tanh activation function. This is a simple RNN. In this architecture, we often have to face the vanishing gradient problem:

Figure 9.12: A simple RNN model

The LSTM architecture is similar to simple RNNs, but their repeating module has different components, as shown in the following diagram...