Book Image

Mastering Parallel Programming with R

By : Simon R. Chapple, Terence Sloan, Thorsten Forster, Eilidh Troup
Book Image

Mastering Parallel Programming with R

By: Simon R. Chapple, Terence Sloan, Thorsten Forster, Eilidh Troup

Overview of this book

R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources. Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques, from simple use of R’s built-in parallel package versions of lapply(), to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing, applicable to clusters and supercomputers, and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book, you will understand the factors that influence parallel efficiency, including assessing code performance and implementing load balancing; pitfalls to avoid, including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.
Table of Contents (13 chapters)

Calling MPI code from R


Let's look at how to call the existing MPI C code from R. What follows is an example that will help when you already have some C or C++ MPI code that you want to call from R. We will look at one simple way of doing this, but please note there are a number of ways this can be done. The definitive guide to calling code in C or other languages from R is the Writing R Extensions manual available from CRAN at http://cran.r-project.org/doc/manuals/r-release/R-exts.html.

If you are writing the MPI C code that you want to call from R from scratch, then you should consider using the Rcpp R package (see http://cran.r-project.org/web/packages/Rcpp/index.html). This package provides C++ wrappers for R data types, thus allowing easy data transfer between C++ and R. It also manages memory for you, and provides other helper methods.

MPI Hello World

Let's start with a simple "Hello World" MPI C program, where each separate process prints hello and its MPI rank number.

#include <stdio...