The application of dropout inside neural networks has long been a subject of research, since the naïve application of dropout to the recurrent connection introduced lots more instability and difficulties to training the RNN.
A solution has been discovered, derived from the variational Bayes Network theory. The resulting idea is very simple and consists of preserving the same dropout mask for the whole sequence on which the RNN is training, as shown in the following picture, and generating a new dropout mask at each new sequence:
Such a technique is called variational RNN. For the connections that have the same arrows in the preceding figure, we'll keep the noise mask constant for the all sequence.
For that purpose, we'll introduce the symbolic variables _is_training
and _noise_x
to add a random (variational) noise (dropout) to input, output, and recurrent connection during training:
_is_training = T.iscalar('is_training') _noise_x = T.matrix('noise_x') inputs = apply_dropout...