-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
We focused on overcoming the limitations of single-GPU computing by using multiple GPUs. We began by introducing the basics of multi-GPU computing, including an overview of multi-GPU systems, particularly their network topology. Then, we discussed two common parallelism approaches: data parallelism and model parallelism. To illustrate these concepts, a simple example of matrix multiplication was used to demonstrate how to implement a multi-GPU version using Numba-CUDA. This example showcased the core concepts of data partitioning, memory data movement, and distributed computing.
We then further scaled up to a multi-node environment by utilizing a Dask-CUDA cluster, which enabled orchestrating computation across multiple GPUs and machines. A simple pipeline was introduced to demonstrate task dependencies in the Dask dashboard, providing insights into performance analysis using the Dask scheduler.
Finally, we switched to JAX, re-implementing the multi-GPU matrix multiplication example...