Book Image

OpenCL Programming by Example

Book Image

OpenCL Programming by Example

Overview of this book

Research in parallel programming has been a mainstream topic for a decade, and will continue to be so for many decades to come. Many parallel programming standards and frameworks exist, but only take into account one type of hardware architecture. Today computing platforms come with many heterogeneous devices. OpenCL provides royalty free standard to program heterogeneous hardware. This guide offers you a compact coverage of all the major topics of OpenCL programming. It explains optimization techniques and strategies in-depth, using illustrative examples and also provides case studies from diverse fields. Beginners and advanced application developers will find this book very useful. Beginning with the discussion of the OpenCL models, this book explores their architectural view, programming interfaces and primitives. It slowly demystifies the process of identifying the data and task parallelism in diverse algorithms. It presents examples from different domains to show how the problems within different domains can be solved more efficiently using OpenCL. You will learn about parallel sorting, histogram generation, JPEG compression, linear and parabolic regression and k-nearest neighborhood, a clustering algorithm in pattern recognition. Following on from this, optimization strategies are explained with matrix multiplication examples. You will also learn how to do an interoperation of OpenGL and OpenCL. "OpenCL Programming by Example" explains OpenCL in the simplest possible language, which beginners will find it easy to understand. Developers and programmers from different domains who want to achieve acceleration for their applications will find this book very useful.
Table of Contents (18 chapters)
OpenCL Programming by Example
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Memory fences


OpenCL C specification provides for runtime barriers in a work item and across a single work group. Barriers may only synchronize threads in the same workgroup. There is no way to synchronize between different work groups. For synchronizing outside of the work group the kernel should complete its execution. There are two types of memory fences:

  • CLK_LOCAL_MEM_FENCE: This ensures correct ordering of operations on local memory. It is used as follows:

    barrier(CLK_LOCAL_MEM_FENCE);

    The barrier function will either flush any variables stored in local memory or queue a memory fence to ensure correct ordering of memory operations to local memory.

  • CLK_GLOBAL_MEM_FENCE: This ensures correct ordering of operations on global memory. It is used as follows:

    barrier(CLK_GLOBAL_MEM_FENCE);

To help you understand, in short you should use CLK_LOCAL_MEM_FENCE when reading and writing to the __local memory space, and CLK_GLOBAL_MEM_FENCE when reading and writing to the __global memory space.

Sometimes...