-
Book Overview & Buying
-
Table Of Contents
Distributed Machine Learning with Python
By :
We discussed Megatron-LM in detail due to its popularity. Now, we will briefly discuss Mesh-TensorFlow in this section.
This approach is quite easy to understand. Basically, Mesh-TensorFlow combines data and model parallelism by allowing users to configure two dimensions—that is, batch and model dimensions—as shown in the following diagram:
Figure 9.13 – Mesh-TensorFlow's two-dimensional (2D) parallelism
As shown in the preceding diagram, mesh-tensorflow allows users to set parallelism levels in two dimensions, as follows:
As shown in Figure 9.13, let's assume the user sets both batch dimension as 2 and model dimension as 2. This means that we use two GPUs to do model-parallel training, and we have two groups of this two-GPU model parallelism...