The two main execution units in OpenCL are the kernels and the host program. The kernels execute on the so called OpenCL device and the host program runs on the host computer. The main purpose of the host program is to create and query the platform and device attributes, define a context for the kernels, build the kernel, and manage the execution of these kernels.
On submission of the kernel by the host to the device, an N dimensional index space is created. N is at least 1 and not greater than 3. Each kernel instance is created at each of the coordinates of this index space. This instance is called as the "work item" and the index space is called as the NDRange. In the following screenshot we have shown the three scenarios for 1, 2 and 3 dimensional NDRange:
In the saxpy
example which we discussed in the previous chapter, we have taken a global size of 1024 and a local size of 64. Each work item computes the corresponding:
C[local id] = alpha* A[local id] + B...