-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Distributed Machine Learning with Python
By :
The first technique we introduce here is called layer freezing. At a high level, we have the assumption that different layers of a model may converge at different stages of the training process. Thus, we can freeze the layers that converge earlier.
Here, freezing refers to the following two operations:
We illustrate this technique in the following diagram:
Figure 8.1 – Simplified illustration of a three-layer language model
As shown in the preceding diagram, we assume the input data has already been tokenized and can be directly fed into the model for either model training or model serving stages. We have a three-layer model. Each layer is an independent transformer layer, and each single transformer layer is allocated on a separate GPU.
Now, let's discuss...