We started this chapter by providing an introduction to the different types of GPU memory. We went into detail about the global, texture, and shared memories, as well as registers. We also looked at what new features the GPU's memory evolution has provided, such as unified memory, which helps to improve the programmer's productivity. We saw how these features are implemented in the latest GPU architectures, such as Pascal and Volta.
In the next chapter, we will go into the details of CUDA thread programming and how to optimally launch different thread configurations to get the best performance out of GPU hardware. We will also be introducing new CUDA Toolkit features such as cooperative groups for flexible thread programming and multi-precision programming on GPUs.