-
Book Overview & Buying
-
Table Of Contents
Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition
By :
Optimizing a transformer involves building lightweight, responsive, and energy-efficient models. Let’s see the most common ideas adopted to optimize a model.
The key idea behind quantization is to approximate the weights of a network with a smaller precision. The idea is very simple, but it works quite well in practice. If you are interested in knowing more, we recommend the paper A Survey of Quantization Methods for Efficient Neural Network Inference, by Amir Gholami et al., https://arxiv.org/pdf/2103.13630.pdf.
The key idea behind weight pruning is to remove some connections in the network. Magnitude-based weight pruning tends to zero out of model weights during training to increase model sparsity. This simple technique has benefits both in terms of model size and in cost of serving, as magnitude-based weight pruning gradually zeroes out of model weights during the training process to achieve model sparsity. Sparse...