Book Image

Accelerate Model Training with PyTorch 2.X

By : Maicon Melo Alves
Book Image

Accelerate Model Training with PyTorch 2.X

By: Maicon Melo Alves

Overview of this book

Penned by an expert in High-Performance Computing (HPC) with over 25 years of experience, this book is your guide to enhancing the performance of model training using PyTorch, one of the most widely adopted machine learning frameworks. You’ll start by understanding how model complexity impacts training time before discovering distinct levels of performance tuning to expedite the training process. You’ll also learn how to use a new PyTorch feature to compile the model and train it faster, alongside learning how to benefit from specialized libraries to optimize the training process on the CPU. As you progress, you’ll gain insights into building an efficient data pipeline to keep accelerators occupied during the entire training execution and explore strategies for reducing model complexity and adopting mixed precision to minimize computing time and memory consumption. The book will get you acquainted with distributed training and show you how to use PyTorch to harness the computing power of multicore systems and multi-GPU environments available on single or multiple machines. By the end of this book, you’ll be equipped with a suite of techniques, approaches, and strategies to speed up training , so you can focus on what really matters—building stunning models!
Table of Contents (17 chapters)
Free Chapter
1
Part 1: Paving the Way
4
Part 2: Going Faster
10
Part 3: Going Distributed

Implementing distributed training on multiple GPUs

In this section, we’ll show you how to implement and run distributed training on multiple GPUs using NCCL, the de facto communication backend for NVIDIA GPUs. We’ll start by providing a brief overview of NCCL, after which we will learn how to code and launch distributed training in a multi-GPU environment.

The NCCL communication backend

NCCL stands for NVIDIA Collective Communications Library. As its name suggests, NCCL is a library that provides optimized collective operations for NVIDIA GPUs. Therefore, we can use NCCL to execute collective routines such as broadcast, reduce, and the so-called all-reduce operation. Roughly speaking, NCCL plays the same role as oneCCL does for Intel CPUs.

PyTorch supports NCCL natively, which means that the default installation of PyTorch for NVIDIA GPUs already comes with a built-in NCCL version. NCCL works on single or multiple machines and supports the usage of high-performance...