-
Book Overview & Buying
-
Table Of Contents
GPU-Accelerated Computing with Python 3 and CUDA
By :
Long and complex kernels are often difficult to read, reason about, and maintain. To improve modularity, parts of the kernel logic can be factored out using device functions. By breaking code into smaller components, both readability and code reuse are enhanced, as device functions may be shared across multiple kernels.
Let's illustrate device functions by using the Julia set kernel. The logic for calculating the next iteration of z could be calculated with the following small function:
@cuda.jit(device=True, inline=True)
def update_z(z: complex, c: complex) -> complex:
return z ** 2 + c
By passing device=True to the cuda.jit decorator, the function is converted to a device function instead of a kernel. inline=True will ensure that when the kernel compiles, the device function logic is directly inserted into the kernel, which avoids the overhead of a function call. This device function can be called in the kernel as follows:
z ...