Book Image

Hands-On GPU Programming with Python and CUDA

By : Dr. Brian Tuomanen
Book Image

Hands-On GPU Programming with Python and CUDA

By: Dr. Brian Tuomanen

Overview of this book

Hands-On GPU Programming with Python and CUDA hits the ground running: you’ll start by learning how to apply Amdahl’s Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. You’ll then see how to “query” the GPU’s features and copy arrays of data to and from the GPU’s own memory. As you make your way through the book, you’ll launch code directly onto the GPU and write full blown GPU kernels and device functions in CUDA C. You’ll get to grips with profiling GPU code effectively and fully test and debug your code using Nsight IDE. Next, you’ll explore some of the more well-known NVIDIA libraries, such as cuFFT and cuBLAS. With a solid background in place, you will now apply your new-found knowledge to develop your very own GPU-based deep neural network from scratch. You’ll then explore advanced topics, such as warp shuffling, dynamic parallelism, and PTX assembly. In the final chapter, you’ll see some topics and applications related to GPU programming that you may wish to pursue, including AI, graphics, and blockchain. By the end of this book, you will be able to apply GPU programming to problems related to data science and high-performance computing.
Table of Contents (15 chapters)

Why GPU Programming?

It turns out that besides being able to render graphics for video games, graphics processing units (GPUs) also provide a readily accessible means for the general consumer to do massively parallel computing—an average person can now buy a $2,000 modern GPU card from a local electronics store, plug it into their PC at home, and then use it almost immediately for computational power that would only have been available in the supercomputing labs of top corporations and universities only 5 or 10 years ago. This open accessibility of GPUs has become apparent in many ways in recent years, which can be revealed by a brief observation of the news—cryptocurrency miners use GPUs to generate digital money such as Bitcoins, geneticists and biologists use GPUs for DNA analysis and research, physicists and mathematicians use GPUs for large-scale simulations, AI researchers can now program GPUs to write plays and compose music, while major internet companies, such as Google and Facebook, use farms of servers with GPUs for large-scale machine learning tasks… the list goes on and on.

This book is primarily aimed at bringing you up to speed with GPU programming, so that you too may begin using their power as soon as possible, no matter what your end goal is. We aim to cover the core essentials of how to program a GPU, rather than provide intricate technical details and schematics of how a GPU works. Toward the end of the book, we will provide further resources so that you may specialize further, and apply your new knowledge of GPUs. (Further details as to particular required technical knowledge and hardware follow this section.)

In this book, we will be working with CUDA, a framework for general-purpose GPU (GPGPU) programming from NVIDIA, which was first released back in 2007. While CUDA is proprietary for NVIDIA GPUs, it is a mature and stable platform that is relatively easy to use, provides an unmatched set of first-party accelerated mathematical and AI-related libraries, and comes with the minimal hassle when it comes to installation and integration. Moreover, there are readily available and standardized Python libraries, such as PyCUDA and Scikit-CUDA, which make GPGPU programming all the more readily accessible to aspiring GPU programmers. For these reasons, we are opting to go with CUDA for this book.

CUDA is always pronounced coo-duh, and never as the acronym C-U-D-A! CUDA originally stood for Compute Unified Device Architecture, but Nvidia has dropped the acronym and now uses CUDA as a proper name written in all-caps.

We will now start our journey into GPU programming with an overview of Amdahl's Law. Amdahl's Law is a simple but effective method to estimate potential speed gains we can get by offloading a program or algorithm onto a GPU; this will help us determine whether it's worth our effort to rewrite our code to make use of the GPU. We will then go over a brief review of how to profile our Python code with the cProfile module, to help us find the bottlenecks in our code.

The learning outcomes for this chapter are as follows:

  • Understand Amdahl's Law
  • Apply Amdahl's Law in the context of your code
  • Using the cProfile module for basic profiling of Python code