So far, we have seen that there are a number of stages in a neural network model. We already know that weight exists between two nodes (of two different layers). The weights undergo a linear transformation and, along with values from input nodes, it crosses through nonlinear activation function in order to yield the value of the next layer. It gets repeated for the next and subsequent layers and later on, with the help of backpropagation, optimal values of weights are found out.
For a long time, weights used to get randomly initialized. Later on, it was realized that the way we initialize the network has a massive impact on the model. Let's see how we initialize the model:
- Zero initialization: In this kind of initialization, all the initial weights are set to zero. Due to this, all the neurons of all the layers perform the same calculation, which results in producing the same output. It will make the whole deep network futile. Predictions coming out of this network would...