In sequence learning the objective is to capture short-term and long-term memory. The short-term memory is captured very well by standard RNN, however, they are not very effective in capturing long-term dependencies as the gradient vanishes (or explodes rarely) within an RNN chain over time.
Note
The gradient vanishes when the weights have small values that on multiplication vanish over time, whereas in contrast, scenarios where weights have large values keep increasing over time and lead to divergence in the learning process. To deal with the issue Long Short Term Memory (LSTM) is proposed.
The RNN models in TensorFlow can easily be extended to LSTM models by using BasicLSTMCell
. The previous rnn
function can be replaced with the lstm
function to achieve an LSTM architecture:
# LSTM implementation lstm<-function(x, weight, bias){ # Unstack input into step_size x = tf$unstack(x, step_size, 1) # Define a...