Book Image

Mastering Parallel Programming with R

By : Simon R. Chapple, Terence Sloan, Thorsten Forster, Eilidh Troup
Book Image

Mastering Parallel Programming with R

By: Simon R. Chapple, Terence Sloan, Thorsten Forster, Eilidh Troup

Overview of this book

R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources. Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques, from simple use of R’s built-in parallel package versions of lapply(), to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing, applicable to clusters and supercomputers, and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book, you will understand the factors that influence parallel efficiency, including assessing code performance and implementing load balancing; pitfalls to avoid, including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.
Table of Contents (13 chapters)

Random numbers


Random numbers take on a new significance in parallel programs, given that usually, you want to have different random number sequences in use across a set of cooperating parallel processes; simulation and optimum search type workloads being prime examples.

The default random number generator in R is Mersenne Twister and is generally recognized to be a good quality pseudorandom number generator, though it's not cryptographically very secure.

Note

Mersenne Twister

To find out more about the properties of the Mersenne Twister random number generator (RNG) you can refer to:

https://en.wikipedia.org/wiki/Mersenne_Twister

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

You can, of course, select alternate generators from the set of built-ins as well as supply your own using the base R random package function RNGKind().

R in itself has always been a single-threaded implementation and is not designed to exploit parallelism within its own language primitives; it relies on specifically...