Long Short-Term Memory Networks
As we mentioned previously, RNNs store short-term memory only. This is an issue when dealing with long sequences of data, where the network will have trouble carrying the information from the earlier steps to the final ones.
For instance, take the poem "The Raven," which was written by the famous poet Edgar Alan Poe and is over 1,000 words long. Attempting to process it using a traditional RNN, with the objective of creating a similar related poem, will result in the model leaving out crucial information from the first couple of paragraphs. This, in turn, may result in an output that is unrelated to the initial subject of the poem. For instance, it could ignore that the event occurred at night, and so make the new poem not very scary.
This inability to hold long-term memory occurs because traditional RNNs suffer from a problem called vanishing gradients. This occurs when the gradients, which are used to update the parameters of the network...