In this chapter, we looked in detail at how to exploit the capability of the GPU in your laptop to perform computation on behalf of R programs through the use of the ROpenCL
package. Along the way, you also learned a little about programming highly efficient kernel function code in the C programming language, with loop unrolling and a careful use of high speed memory.
As we noted, while the goal for OpenCL is one of heterogeneous portability, in which the same code can run on a variety of devices (including the CPU itself), the reality is that with GPUs in particular, there is room for code optimization that is tailored to the characteristics of the underlying device hardware to extract the maximum possible performance. Obtaining the best performance for a kernel function is about balancing memory access and exploiting vector processing, and ultimately requires your own experimentation.
In the next and final chapter, we will distill the essential lessons from the various different...