-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
In this section, we'll look at how a GPU runs a kernel. Understanding what happens under the hood is the first step to making kernels run faster.
To better understand what happens when a kernel runs on the device, and to get familiar with the key terms, this section will go over the high-level hardware architecture. Figure 5.1 shows a diagram of the most important components in a GPU:

Figure 5.1 – A diagram showing the hardware architecture of a GPU
The core computational units of a GPU are the streaming multiprocessors (SMs). Each SM contains multiple arithmetic logic units (ALUs), also referred to as CUDA cores, which perform basic arithmetic operations (addition, subtraction, multiplication, ...) for kernel threads. The term "CUDA core" specifically denotes ALUs handling 32-bit floating-point (F32) and 32-bit integer (I32) operations. Some CUDA cores are dedicated to F32, while others support both F32...