-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
When a thread requests data from global memory, it fetches more than just this data. Data read and write requests from all threads in a warp are bundled into transactions. Transactions always fetch a fixed amount of data from fixed places in memory.
Think of memory like a very long line of boxes, each representing a byte (=8 bits) of data. Very often, a piece of data required by a thread will be 4 consecutive bytes (for 32-bit floats or integers) or 8 consecutive bytes (for 64-bit floats or integers). Now think of memory transactions like a container crane, packaging multiple of these boxes for shipment. The crane is only able to stop at well-defined positions along the line and is only able to put a fixed number of consecutive boxes in the container. Only entire containers can be shipped from memory to the SMs. The idea is visualized in Figure 5.6:

Figure 5.6 – An analogy for memory transactions. Units of data are represented by dark boxes...