In this chapter, we discussed how to implement a model training and serving pipeline using the data parallelism paradigm.
First, we illustrated the whole data parallel training pipeline and defined the key functions in each step. Then, we showed the implementation of data parallel training in both single-machine multi-GPUs and multi-machine multi-GPUs. We concluded that this multi-process implementation is better than a single process with multi-threading. Then, we discussed adding the fault tolerance feature to a data parallel training job. After that, we showed you how to conduct in-parallel model evaluation and hyperparameter tuning. Finally, we demonstrated how to implement data parallel model serving.
In the next chapter, we will discuss the bottlenecks in the current solutions for data parallelism. We will also provide solutions that can mitigate these bottleneck issues and boost end-to-end performance.