-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
In our previous example, all CUDA operations queued on multiple streams were issued from a single CPU thread. This can potentially create a computational bottleneck if preparing and submitting tasks to the streams requires significant CPU processing or if the host is blocked on I/O. In such cases, multiple CPU threads can issue work to the GPU concurrently, as shown in the following schematic diagram. Multithreading is especially useful when I/O or host-side preparation would otherwise serialize GPU work if only a single thread were responsible. The CUDA runtime API is thread-safe, so each CPU thread can independently create and use its own CUDA stream. This allows the CPU to keep feeding work to the GPU while other streams are still processing data:

Figure 6.7 – Multiple CPU threads supplying data to each CUDA stream
For Python, the Global Interpreter Lock (GIL) must be taken into account. Although the GIL restricts parallel execution of...