Book Image

Hands-On GPU Programming with Python and CUDA

By : Dr. Brian Tuomanen
Book Image

Hands-On GPU Programming with Python and CUDA

By: Dr. Brian Tuomanen

Overview of this book

Hands-On GPU Programming with Python and CUDA hits the ground running: you’ll start by learning how to apply Amdahl’s Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. You’ll then see how to “query” the GPU’s features and copy arrays of data to and from the GPU’s own memory. As you make your way through the book, you’ll launch code directly onto the GPU and write full blown GPU kernels and device functions in CUDA C. You’ll get to grips with profiling GPU code effectively and fully test and debug your code using Nsight IDE. Next, you’ll explore some of the more well-known NVIDIA libraries, such as cuFFT and cuBLAS. With a solid background in place, you will now apply your new-found knowledge to develop your very own GPU-based deep neural network from scratch. You’ll then explore advanced topics, such as warp shuffling, dynamic parallelism, and PTX assembly. In the final chapter, you’ll see some topics and applications related to GPU programming that you may wish to pursue, including AI, graphics, and blockchain. By the end of this book, you will be able to apply GPU programming to problems related to data science and high-performance computing.
Table of Contents (15 chapters)

Thread-safe atomic operations

We will now learn about atomic operations in CUDA. Atomic operations are very simple, thread-safe operations that output to a single global array element or shared memory variable, which would normally lead to race conditions otherwise.

Let's think of one example. Suppose that we have a kernel, and we set a local variable called x across all threads at some point. We then want to find the maximum value over all xs, and then set this value to the shared variable we declare with __shared__ int x_largest. We can do this by just calling atomicMax(&x_largest, x) over every thread.

Let's look at a brief example of atomic operations. We will write a small program for two experiments:

  • Setting a variable to 0 and then adding 1 to this for each thread
  • Finding the maximum thread ID value across all threads

Let's start out by setting the...