-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
In this chapter, we began with an overview of CUDA streams, including the concept of concurrency, when to use streams to improve performance, implicit synchronization, and how streams are executed on the GPU. We then demonstrated how to create streams in Numba-CUDA and outlined the requirements for performing asynchronous data transfers between the host and device, including pinned memory and creating non-default streams.
Next, we built a retina image processing pipeline using multiple streams, illustrating how streams can enhance overall performance. We further analyzed, based on profiling results, why and how streams contribute to speedup.
We then introduced CUDA events, showing how they can be used to measure execution time for individual streams and to establish dependencies between streams. Finally, we explored combining multithreading with CUDA streams for scenarios where CPU data preparation becomes a bottleneck.
In the next chapter, we will explore how to scale our computations...