-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Distributed Machine Learning with Python
By :
In this section, we will discuss some of the important hyperparameters required during the model parallel training process, such as balancing the workload among GPUs and enabling/disabling pipeline parallelism.
In most of the cases, we split the model layer-wise. Since we use homogenous GPUs, we should try to balance the workload among all the GPUs we have.
GPU workload is not always linearly proportional to the number of layers held inside the GPU. One way to balance the workload among GPUs is to look at its computation core utilization. This computation utility value can be found in nvidia-smi. For example, the following screenshot shows that GPU0 has a greater workload than GPU1 – Volatile GPU-Util on GPU0 is 42%, whereas on GPU1, it is 20%:
Figure 7.12 – GPUs are underutilized
Thus, we need to move some of the layers originally assigned on GPU0 to GPU1...