Book Image

OpenCL Parallel Programming Development Cookbook

By : Raymond Tay
Book Image

OpenCL Parallel Programming Development Cookbook

By: Raymond Tay

Overview of this book

<p>OpenCL (Open Computing Language) is the first royalty-free standard for cross platform, parallel programming of modern processors found in personal computers, servers, mobiles, and embedded devices. OpenCL greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories, from gaming and entertainment to scientific and medical software. OpenCL has proved itself to be versatile in that it now runs on not only operating systems like Windows and Linux powered by Intel and AMD processors, but also on low power chips like ARM, and it has also been adopted by processor manufacturers like ARM Corp, Vivante, and Altera, among others.</p> <p>OpenCL Parallel Programming Development Cookbook was designed to be practical so that we achieve a good balance between theory and application. Learning to program in a parallel way is relatively easy, but to be able to take advantage of all of the resources available to you efficiently is quite different. You need to be shown not only application, but also the theory behind it.</p> <p>This book is roughly in two parts, where the first part is the fundamentals of OpenCL parallel development and the second part is the various algorithms we will explore with you. Each part is packed with many code samples and illustrations to demonstrate various concepts. The first part is essential for a beginner to not only program in parallel, but also to think in parallel and become equipped with the mental model with which to tackle parallel programming. The second part consists of seven different algorithms that the author has identified; you will learn various parallel programming techniques that experts have used in the past 60 years that are applicable to OpenCL.</p> <p>This book will demonstrate how you think in parallel by illustrating and demonstrating programming techniques like data partitioning, thread coarsening, register tiling, data pre-fetching, and algorithm transformation. These techniques are demonstrated in the seven algorithms you’ll be shown, from image processing and solving sparse linear systems to in-memory sorting.<br />OpenCL Parallel Programming Development Cookbook combines recipes, illustrations, code, and explanations to allow you to learn the essentials of parallel programming in OpenCL, and the author has added in enough math so that the readers understand the motivation and can also lay the foundation upon which they will begin their own exploration.</p>
Table of Contents (17 chapters)
OpenCL Parallel Programming Development Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

Welcome to the OpenCL Parallel Programming Development Cookbook! Whew, that was more than a mouthful. This book was written by a developer, that's me, and for a developer, hopefully that's you. This book will look familiar to some and distinct to others. It is a result of my experience with OpenCL, but more importantly in programming heterogeneous computing environments. I wanted to organize the things I've learned and share them with you, the reader, and decided upon taking an approach where each problem is categorized into a recipe. These recipes are meant to be concise, but admittedly some are longer than others. The reason for doing that is because the problems I've chosen, which manifest as chapters in this book describe how you can apply those techniques to your current or future work. Hopefully it can be a part of the reference, which rests on your desk among others. I certainly hope that understanding the solution to these problems can help you as much as they helped me.

This book was written keeping a software developer in mind, who wishes to know not only how to program in parallel but also think in parallel. The latter is in my opinion more important than the former, but neither of them alone solves anything. This book reinforces each concept with code and expands on that as we leverage upon more recipes.

This book is structured to ease you gently into OpenCL by getting you to be familiar with the core concepts of OpenCL, and then we'll take deep dives by applying that newly gained knowledge into the various recipes and general parallel computing problems you'll encounter in your work.

To get the most out of this book, it is highly recommended that you are a software developer or an embedded software developer, and is interested in parallel software development but don't really know where/how to start. Ideally, you should know some C or C++ (you can pick C up since its relatively simple) and comfortable using a cross-platform build system, for example, CMake in Linux environments. The nice thing about CMake is that it allows you to generate build environments for those of you who are comfortable using Microsoft's Visual Studio, Apple's XCode, or some other integrated development environment. I have to admit that the examples in this book used neither of these tools.

What this book covers

Chapter 1, Using OpenCL, sets the stage for the reader by establishing OpenCL in its purpose and motivation. The core concepts are outlined in the recipes covering the intrinsics of devices and their interactions and also by real working code. The reader will learn about contexts and devices and how to create code that runs on those devices.

Chapter 2, Understanding OpenCL Data Transfer and Partitioning, discusses the buffer objects in OpenCL and strategies on how to partition data amongst them. Subsequently, readers will learn what work items are and how data partitioning can take effect by leveraging OpenCL abstractions.

Chapter 3, Understanding OpenCL Data Types, explains the two general data types that OpenCL offers, namely scalar and vector data types, how they are used to solve different problems, and how OpenCL abstracts native vector architectures in processors. Readers will be shown how they can effect programmable vectorization through OpenCL.

Chapter 4, Understanding OpenCL Functions, discusses the various functionalities offered by OpenCL in solving day-to-day problems, for example, geometry, permuting, and trigonometry. It also explains how to accelerate that by using their vectorized counterparts.

Chapter 5, Developing a Histogram OpenCL program, witnesses the lifecycle of a typical OpenCL development. It also discusses about the data partitioning strategies that rely on being cognizant of the algorithm in question. The readers will inadvertently realize that not all algorithms or problems require the same approach.

Chapter 6, Developing a Sobel Edge Detection Filter, will guide you in how to build an edge detection filter using the Sobel's method. They will be introduced into some mathematical formality including convolution theory in one-dimension and two-dimensions and its accompanying code. And finally, we introduce how profiling works in OpenCL and its application in this recipe.

Chapter 7, Developing the Matrix Multiplication with OpenCL, discusses parallelizing the matrix multiplication by studying its parallel form and applying the tranformation from sequential to parallel. Next, it'll optimize the matrix multiplication by discussing how to increase the computation throughput and warming the cache.

Chapter 8, Developing the Sparse Matrix-Vector Multiplication with OpenCL, discusses the context of this computation and the conventional method used to solve it, that is, the conjugate gradient through enough math. Once that intuition is developed, readers will be shown how various storage formats for sparse matrices can affect the parallel computation and then the readers can examine the ELLPACK, ELLPACK-R, COO, and CSR.

Chapter 9, Developing Bitonic Sort Using OpenCL, will introduce readers, to the world of sorting algorithms, and focus on the parallel sorting network also known as bitonic sort. This chapter works through the recipes, as we did in all other chapters by presenting the theory and its sequential implementation, and extracting the parallelism from the transformation, and then developing the final parallel version.

Chapter 10, Developing the Radix Sort with OpenCL, will introduce a classic example of non-comparison based sorting algorithms, for example, QuickSort where it suits a GPU architecture better. The reader is also introduced to another core parallel programming technique known as reduction, and we developed the intuition of how reduction helps radix sort perform better. The radix sort recipe also demonstrates multiple kernel programming and highlights the advantages as well as the disadvantages.

What you need for this book

You need to be comfortable working in a Linux environment, as the examples are tested against the Ubuntu 12.10 64-bit operating system. The following are the requirements:

  • GNU GCC C/C++ compiler Version 4.6.1 (at least)

  • OpenCL 1.2 SDK by AMD, Intel & NVIDIA

  • AMD APP SDK Version 2.8 with AMD Catalyst Linux Display Driver Version 13.4

  • Intel OpenCL SDK 2012

  • CMake Version 2.8 (at least)

  • Clang Version 3.1 (at least)

  • Microsoft Visual C++ 2010 (if you work on Windows)

  • Boost Library Version 1.53

  • VexCL (by Denis Demidov)

  • CodeXL Profiler by AMD (Optional)

  • At least eight hours of sleep

  • An open and receptive mind

  • A fresh brew of coffee or whatever that works

Who this book is for

This book is intended for software developers who have often wondered what to do with that newly bought CPU or GPU they bought other than using it for playing computer games. Having said that, this book isn't about toy algorithms that works only on your workstations at home. This book is ideally for the developers who have a working knowledge of C/C++ and who want to learn how to write parallel programs that execute in heterogeneous computing environments in OpenCL.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "We can include other contexts through the use of the #include directive."

A block of code is set as follows:

[default]
cl_uint sortOrder = 0; // descending order else 1 for ascending order
        cl_uint stages = 0;
        for(unsigned int i = LENGTH; i > 1; i >>= 1)
            ++stages;
        clSetKernelArg(kernel, 0, sizeof(cl_mem),(void*)&device_A_in);
        clSetKernelArg(kernel, 3, sizeof(cl_uint),(void*)&sortOrder);
#ifdef USE_SHARED_MEM
        clSetKernelArg(kernel, 4, (GROUP_SIZE << 1) *sizeof(cl_uint),NULL);
#elif def USE_SHARED_MEM_2

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
cl_uint sortOrder = 0; // descending order else 1 for ascending order
        cl_uint stages = 0;
        for(unsigned int i = LENGTH; i > 1; i >>= 1)
            ++stages;
        clSetKernelArg(kernel, 0, sizeof(cl_mem),(void*)&device_A_in);
        clSetKernelArg(kernel, 3, sizeof(cl_uint),(void*)&sortOrder);
#ifdef USE_SHARED_MEM
        clSetKernelArg(kernel, 4, (GROUP_SIZE << 1) *sizeof(cl_uint),NULL);
#elif def USE_SHARED_MEM_2

Any command-line input or output is written as follows:

# gcc –Wall test.c –o test

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking on the Next button moves you to the next screen".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.