During the inference, the whole neural network should be loaded into the memory, so as mobile developers we are especially interested in the small architectures, which consume as little memory as possible. Small neural networks also allow to reduce the bandwidth consumption when downloaded from the network.
Several architectures designed to reduce the size of convolutional neural networks have been proposed recently. We will discuss in brief several most known of them.
The architecture was proposed by Iandola et al. in 2017 for use in autonomous cars. As the baseline, researchers took the AlexNet architecture. This network takes 240 MB of memory, which is pretty much the equivalent of mobile devices. SqueezeNet has 50x fewer parameters, and achieves the same level of accuracy on the ImageNet dataset. Using additional compression, its size can be reduced to about 0.5 MB.
SqueezeNet is built from the fire modules. The objective was to create a neural network...