So, let's apply the highway network design to deep transition recurrent networks, which leads to the definition of Recurrent Highway Networks (RHN), and predict the output given the input of the transition:
The transition is built with multiple steps of highway connections:
Here the transform gate is as follows:
And, to reduce the number of weights, the carry gate is taken as the complementary to the transform gate:
For faster computation on a GPU, it is better to compute the linear transformation on inputs over different time steps and in a single big matrix multiplication, all-steps input matrices and at once, since the GPU will use a better parallelization, and provide these inputs to the recurrency:
y_0 = shared_zeros((batch_size, hidden_size)) y, _ = theano.scan(deep_step_fn, sequences = [i_for_H, i_for_T], outputs_info = [y_0], non_sequences = [noise_s])
With a deep transition between each step:
def deep_step_fn(i_for_H_t, i_for_T_t, y_tm1, noise_s...