Before we jump in, we will first learn about the techniques that can reduce the number of parameters a model uses. This is important, firstly because it should improve your network's ability to generalize, as it will need less training data fed into it to utilize the number of parameters present in the model. Secondly, having less parameters means more hardware efficiency, as less memory will be needed.
Here, we will start by explaining one important technique for reducing model parameters, cascading several small convolutions together. In the diagram that follows, we have two 3x3 convolution layers. If we look at the second layer, on the right of the diagram, working back, we can see that one neuron in the second layer has a 3x3 receptive field:
When we say "receptive field," we mean the area that it can see from a previous layer. In this example, a 3x3 area is needed to create one output, hence a 3x3 receptive field.
Working back another layer, each element...