-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
CuPy abstracts much of the underlying complexity of GPU computing, enabling high-performance operations without low-level management. However, this abstraction does not eliminate the need for performance awareness. This section highlights key considerations to optimize CuPy-based code effectively.
By default, CuPy arrays use the np.float64 data type for compatibility with NumPy. However, as alluded to multiple times, performance on most GPUs is better when using single precision. When investigating performance, always look at the data types of arrays.
Special attention should be paid to silent casting. Multiplying a float32 array with a float64 array results in a float64 array because float32 is silently upcast to float64.
The dtype of an existing array can be changed with the astype method. However, this will copy the data, which can be an expensive operation. Provide the dtype of an array at creation time, whenever possible.