OpenCL Programming by Example

Book Image

OpenCL Programming by Example

Book Image

OpenCL Programming by Example

Overview of this book

Research in parallel programming has been a mainstream topic for a decade, and will continue to be so for many decades to come. Many parallel programming standards and frameworks exist, but only take into account one type of hardware architecture. Today computing platforms come with many heterogeneous devices. OpenCL provides royalty free standard to program heterogeneous hardware. This guide offers you a compact coverage of all the major topics of OpenCL programming. It explains optimization techniques and strategies in-depth, using illustrative examples and also provides case studies from diverse fields. Beginners and advanced application developers will find this book very useful. Beginning with the discussion of the OpenCL models, this book explores their architectural view, programming interfaces and primitives. It slowly demystifies the process of identifying the data and task parallelism in diverse algorithms. It presents examples from different domains to show how the problems within different domains can be solved more efficiently using OpenCL. You will learn about parallel sorting, histogram generation, JPEG compression, linear and parabolic regression and k-nearest neighborhood, a clustering algorithm in pattern recognition. Following on from this, optimization strategies are explained with matrix multiplication examples. You will also learn how to do an interoperation of OpenGL and OpenCL. "OpenCL Programming by Example" explains OpenCL in the simplest possible language, which beginners will find it easy to understand. Developers and programmers from different domains who want to achieve acceleration for their applications will find this book very useful.

OpenCL Programming by Example

OpenCL Programming by Example

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Hello OpenCL

Advances in computer architecture

Different parallel programming techniques

Introduction to OpenCL

OpenCL components

An example of OpenCL program

OpenCL Architecture

OpenCL Architecture

Platform versions

Execution model

Application scaling

OpenCL Buffer Objects

OpenCL Buffer Objects

Creating subbuffer objects

Histogram calculation

Reading and writing buffers

Copying buffers

Mapping buffer objects

Querying buffer objects

Undefined behavior of the cl_mem objects

OpenCL Images

Creating images

Reading and writing buffers

Copying and filling images

Mapping image objects

Querying image objects

Image histogram computation

OpenCL Program and Kernel Objects

OpenCL Program and Kernel Objects

Creating program objects

Creating kernel objects

Events and Synchronization

Events and Synchronization

OpenCL events and monitoring these events

OpenCL event synchronization models

Coarse-grained synchronization

Event-based or fine-grained synchronization

Getting information about cl_event

User-created events

Event profiling

OpenCL C Programming

OpenCL C Programming

Built-in data types

Conversions and type casts

Address space qualifiers

Image access qualifiers

Storage class specifiers

Built-in functions

Basic Optimization Techniques with Case Studies

Basic Optimization Techniques with Case Studies

Finding the performance of your program?

Case study – matrix multiplication

Case study – Histogram calculation

Finding the scope of the use of OpenCL

Image Processing and OpenCL

Image Processing and OpenCL

Image representation

Implementing image filters

OpenCL implementation of filters

JPEG compression

OpenCL-OpenGL Interoperation

OpenCL-OpenGL Interoperation

Introduction to OpenGL

Defining Interoperation

Implementing Interoperation

Case studies – Regressions, Sort, and KNN

Case studies – Regressions, Sort, and KNN

Regression with least square curve fitting

k-Nearest Neighborhood (k-NN) algorithm

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

General tips

Some of the following strategies are vendor and architecture specific but mostly have a corresponding counterpart in other vendors and architectures.

Try to minimize host-device transfer of memory. Also try to hide memory transfer latencies with parallel computation. Host-device transfer has much lower bandwidth than global memory access. (For example, for NVIDIA GTX 280 verses PCI-e it becomes approximately 17 times). So better to store and keep it on the Global memory. Sometimes it is even better to re-compute something in GPU rather than trying to fetch from host.
One large transfer is much better than many smaller transfers amounting to same size.
Try for coalesced memory access as much as possible, that is, avoid out of sequence and misaligned transactions. This is more OpenCL device architecture and compute capability specific.
Use local memory (100 times better latency for GTX 280) for caching, but be careful about overuse to avoid performance penalty due to spilling to global...