Welcome to the OpenCL Parallel Programming Development Cookbook! Whew, that was more than a mouthful. This book was written by a developer, that's me, and for a developer, hopefully that's you. This book will look familiar to some and distinct to others. It is a result of my experience with OpenCL, but more importantly in programming heterogeneous computing environments. I wanted to organize the things I've learned and share them with you, the reader, and decided upon taking an approach where each problem is categorized into a recipe. These recipes are meant to be concise, but admittedly some are longer than others. The reason for doing that is because the problems I've chosen, which manifest as chapters in this book describe how you can apply those techniques to your current or future work. Hopefully it can be a part of the reference, which rests on your desk among others. I certainly hope that understanding the solution to these problems can help you as much as they helped me.
This book was written keeping a software developer in mind, who wishes to know not only how to program in parallel but also think in parallel. The latter is in my opinion more important than the former, but neither of them alone solves anything. This book reinforces each concept with code and expands on that as we leverage upon more recipes.
This book is structured to ease you gently into OpenCL by getting you to be familiar with the core concepts of OpenCL, and then we'll take deep dives by applying that newly gained knowledge into the various recipes and general parallel computing problems you'll encounter in your work.
To get the most out of this book, it is highly recommended that you are a software developer or an embedded software developer, and is interested in parallel software development but don't really know where/how to start. Ideally, you should know some C or C++ (you can pick C up since its relatively simple) and comfortable using a cross-platform build system, for example, CMake in Linux environments. The nice thing about CMake is that it allows you to generate build environments for those of you who are comfortable using Microsoft's Visual Studio, Apple's XCode, or some other integrated development environment. I have to admit that the examples in this book used neither of these tools.
Chapter 1, Using OpenCL, sets the stage for the reader by establishing OpenCL in its purpose and motivation. The core concepts are outlined in the recipes covering the intrinsics of devices and their interactions and also by real working code. The reader will learn about contexts and devices and how to create code that runs on those devices.
Chapter 2, Understanding OpenCL Data Transfer and Partitioning, discusses the buffer objects in OpenCL and strategies on how to partition data amongst them. Subsequently, readers will learn what work items are and how data partitioning can take effect by leveraging OpenCL abstractions.
Chapter 3, Understanding OpenCL Data Types, explains the two general data types that OpenCL offers, namely scalar and vector data types, how they are used to solve different problems, and how OpenCL abstracts native vector architectures in processors. Readers will be shown how they can effect programmable vectorization through OpenCL.
Chapter 4, Understanding OpenCL Functions, discusses the various functionalities offered by OpenCL in solving day-to-day problems, for example, geometry, permuting, and trigonometry. It also explains how to accelerate that by using their vectorized counterparts.
Chapter 5, Developing a Histogram OpenCL program, witnesses the lifecycle of a typical OpenCL development. It also discusses about the data partitioning strategies that rely on being cognizant of the algorithm in question. The readers will inadvertently realize that not all algorithms or problems require the same approach.
Chapter 6, Developing a Sobel Edge Detection Filter, will guide you in how to build an edge detection filter using the Sobel's method. They will be introduced into some mathematical formality including convolution theory in one-dimension and two-dimensions and its accompanying code. And finally, we introduce how profiling works in OpenCL and its application in this recipe.
Chapter 7, Developing the Matrix Multiplication with OpenCL, discusses parallelizing the matrix multiplication by studying its parallel form and applying the tranformation from sequential to parallel. Next, it'll optimize the matrix multiplication by discussing how to increase the computation throughput and warming the cache.
Chapter 8, Developing the Sparse Matrix-Vector Multiplication with OpenCL, discusses the context of this computation and the conventional method used to solve it, that is, the conjugate gradient through enough math. Once that intuition is developed, readers will be shown how various storage formats for sparse matrices can affect the parallel computation and then the readers can examine the ELLPACK, ELLPACK-R, COO, and CSR.
Chapter 9, Developing Bitonic Sort Using OpenCL, will introduce readers, to the world of sorting algorithms, and focus on the parallel sorting network also known as bitonic sort. This chapter works through the recipes, as we did in all other chapters by presenting the theory and its sequential implementation, and extracting the parallelism from the transformation, and then developing the final parallel version.
Chapter 10, Developing the Radix Sort with OpenCL, will introduce a classic example of non-comparison based sorting algorithms, for example, QuickSort where it suits a GPU architecture better. The reader is also introduced to another core parallel programming technique known as reduction, and we developed the intuition of how reduction helps radix sort perform better. The radix sort recipe also demonstrates multiple kernel programming and highlights the advantages as well as the disadvantages.
You need to be comfortable working in a Linux environment, as the examples are tested against the Ubuntu 12.10 64-bit operating system. The following are the requirements:
GNU GCC C/C++ compiler Version 4.6.1 (at least)
OpenCL 1.2 SDK by AMD, Intel & NVIDIA
AMD APP SDK Version 2.8 with AMD Catalyst Linux Display Driver Version 13.4
Intel OpenCL SDK 2012
CMake Version 2.8 (at least)
Clang Version 3.1 (at least)
Microsoft Visual C++ 2010 (if you work on Windows)
Boost Library Version 1.53
VexCL (by Denis Demidov)
CodeXL Profiler by AMD (Optional)
At least eight hours of sleep
An open and receptive mind
A fresh brew of coffee or whatever that works
This book is intended for software developers who have often wondered what to do with that newly bought CPU or GPU they bought other than using it for playing computer games. Having said that, this book isn't about toy algorithms that works only on your workstations at home. This book is ideally for the developers who have a working knowledge of C/C++ and who want to learn how to write parallel programs that execute in heterogeneous computing environments in OpenCL.
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text are shown as follows: "We can include other contexts through the use of the #include
directive."
A block of code is set as follows:
[default] cl_uint sortOrder = 0; // descending order else 1 for ascending order cl_uint stages = 0; for(unsigned int i = LENGTH; i > 1; i >>= 1) ++stages; clSetKernelArg(kernel, 0, sizeof(cl_mem),(void*)&device_A_in); clSetKernelArg(kernel, 3, sizeof(cl_uint),(void*)&sortOrder); #ifdef USE_SHARED_MEM clSetKernelArg(kernel, 4, (GROUP_SIZE << 1) *sizeof(cl_uint),NULL); #elif def USE_SHARED_MEM_2
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default] cl_uint sortOrder = 0; // descending order else 1 for ascending order cl_uint stages = 0; for(unsigned int i = LENGTH; i > 1; i >>= 1) ++stages; clSetKernelArg(kernel, 0, sizeof(cl_mem),(void*)&device_A_in); clSetKernelArg(kernel, 3, sizeof(cl_uint),(void*)&sortOrder); #ifdef USE_SHARED_MEM clSetKernelArg(kernel, 4, (GROUP_SIZE << 1) *sizeof(cl_uint),NULL); #elif def USE_SHARED_MEM_2
Any command-line input or output is written as follows:
# gcc –Wall test.c –o test
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking on the Next button moves you to the next screen".
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>
, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]>
with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]>
if you are having a problem with any aspect of the book, and we will do our best to address it.