-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
In this chapter, we learned about the concepts behind the CUDA programming model. We learned what a kernel is and how it can be launched on a grid of threads. The grid is composed of blocks, which are, in turn, composed of threads. We applied these concepts to write a GPU implementation for calculating a Julia set.
We then saw some patterns for writing more robust kernels that are flexible to the size of the grid, and applied them to our Julia set kernel. We looked at ways to make kernels more modular with device functions. We touched upon techniques for thread coordination for problems that are not embarrassingly parallel: atomics and thread synchronization. We looked at how numba.cuda turns Python functions into CUDA kernels, and went over the language features that numba.cuda supports in kernels. Finally, we looked at specific types of problems that we can solve with less code using vectorize and reduce.
Writing correct and performant CUDA kernels is not easy. Therefore, in...