Book Image

Hands-On GPU Programming with Python and CUDA

By : Dr. Brian Tuomanen
Book Image

Hands-On GPU Programming with Python and CUDA

By: Dr. Brian Tuomanen

Overview of this book

Hands-On GPU Programming with Python and CUDA hits the ground running: you’ll start by learning how to apply Amdahl’s Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. You’ll then see how to “query” the GPU’s features and copy arrays of data to and from the GPU’s own memory. As you make your way through the book, you’ll launch code directly onto the GPU and write full blown GPU kernels and device functions in CUDA C. You’ll get to grips with profiling GPU code effectively and fully test and debug your code using Nsight IDE. Next, you’ll explore some of the more well-known NVIDIA libraries, such as cuFFT and cuBLAS. With a solid background in place, you will now apply your new-found knowledge to develop your very own GPU-based deep neural network from scratch. You’ll then explore advanced topics, such as warp shuffling, dynamic parallelism, and PTX assembly. In the final chapter, you’ll see some topics and applications related to GPU programming that you may wish to pursue, including AI, graphics, and blockchain. By the end of this book, you will be able to apply GPU programming to problems related to data science and high-performance computing.
Table of Contents (15 chapters)

Summary

The main advantage of using a GPU over a CPU is its increased throughput, which means that we can execute more parallel code simultaneously on GPU than on a CPU; a GPU cannot make recursive algorithms or nonparallelizable algorithms somewhat faster. We see that some tasks, such as the example of building a house, are only partially parallelizable—in this example, we couldn't speed up the process of designing the house (which is intrinsically serial in this case), but we could speed up the process of the construction, by hiring more laborers (which is parallelizable in this case).

We used this analogy to derive Amdahl's Law, which is a formula that can give us a rough estimate of potential speedup for a program if we know the percentage of execution time for code that is parallelizable, and how many processors we will have to run this code. We then applied Amdahl's Law to analyze a small program that generates the Mandelbrot set and dumps it to an image file, and we determined that this would be a good candidate for parallelization onto a GPU. Finally, we ended with a brief overview of profiling code with the cPython module; this allows us to see where the bottlenecks in a program are, without explicitly timing function calls.

Now that we have a few of the fundamental concepts in place, and have a motivator to learn GPU programming, we will spend the next chapter setting up a Linux- or Windows 10-based GPU programming environment. We will then immediately dive into the world of GPU programming in the following chapter, where we will actually write a GPU-based version of the Mandelbrot program that we saw in this chapter.