Book Image

Learn CUDA Programming

By : Jaegeun Han, Bharatkumar Sharma
Book Image

Learn CUDA Programming

By: Jaegeun Han, Bharatkumar Sharma

Overview of this book

<p>Compute Unified Device Architecture (CUDA) is NVIDIA's GPU computing platform and application programming interface. It's designed to work with programming languages such as C, C++, and Python. With CUDA, you can leverage a GPU's parallel computing power for a range of high-performance computing applications in the fields of science, healthcare, and deep learning. </p><p> </p><p>Learn CUDA Programming will help you learn GPU parallel programming and understand its modern applications. In this book, you'll discover CUDA programming approaches for modern GPU architectures. You'll not only be guided through GPU features, tools, and APIs, you'll also learn how to analyze performance with sample parallel programming algorithms. This book will help you optimize the performance of your apps by giving insights into CUDA programming platforms with various libraries, compiler directives (OpenACC), and other languages. As you progress, you'll learn how additional computing power can be generated using multiple GPUs in a box or in multiple boxes. Finally, you'll explore how CUDA accelerates deep learning algorithms, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). </p><p> </p><p>By the end of this CUDA book, you'll be equipped with the skills you need to integrate the power of GPU computing in your applications.</p>
Table of Contents (18 chapters)
Title Page
Dedication

GPU memory evolution

GPU architectures have evolved over time and memory architectures have changed considerably. If we take a look at the last four generations, there are some common patterns which emerge, some of which are as follows:

  • The memory capacity, in general, has increased in levels.
  • The memory bandwidth and capacity have increased with new generation architectures.

The following table shows the properties for the last four generations:

Memory type Properties Volta V100 Pascal P100 Maxwell M60 Kepler K80
Register Size per SM 256 KB 256 KB 256 KB 256 KB
L1 Size 32...128 KiB 24 KiB 24 KiB 16...48 KiB
Line size 32 32 B 32 B 128 B
L2 Size 6144 KiB 4,096 KiB 2,048 KiB 1,536 Kib
Line size 64 B 32B 32B 32B
Shared memory Size per SMX Up to 96 KiB 64 KiB 64 KiB 48 KiB
Size per GPU up to 7,689 KiB 3,584 KiB 1,536 KiB 624 KiB
Theoretical bandwidth 13,800 GiB...