Wrapping up the whole model parallelism pipeline
In this section, we will discuss the components for implementing model parallelism. We will first discuss how to implement a model parallel training pipeline and then how to implement a model parallel serving pipeline.
A model parallel training overview
- After GPU1 consumes the input training batch, it will calculate the activation values of Layer 1.
- After GPU2 receives output from GPU1, GPU2 starts its own forward propagation, which...