-
Book Overview & Buying
-
Table Of Contents
The Deep Learning Workshop
By :
LSTMs are built on plain RNNs. If you simplified the LSTM and removed all the gates, retaining only the tanh function for the hidden state update, you would have a plain RNN. The number of activations that the information – the new input data at time t and the previous hidden state at time t-1 (xt and ht-1) – passes through in an LSTM is four times the number that it passes through in a plain RNN. The activations are applied once in the forget gate, twice in the update gate, and once in the output gate. The number of weights/parameters in an LSTM is, therefore, four times the number of parameters in a plain RNN.
In Chapter 5, Deep Learning For Sequences, in the section titled Parameters in an RNN, we calculated the number of parameters in a plain RNN and saw that we already have a quite a few parameters to work with (n2 + nk + nm, where n is the number of neurons in the hidden layer, m is the number of inputs, and k is the dimension of the output...