Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Executing tasks in parallel on a cluster of computers


By using the parallel package, we are not limited to running parallel code on a single computer; we can also do it on a cluster of computers. This allows much larger computational tasks to be performed, irrespective of whether we use data parallelism or task parallelism. Only socket-based clusters can be used for this purpose, as processes cannot be forked onto a different computer.

There are many ways to set up a cluster of computers to work with R. To keep things simple, all computers in the cluster should have the same configuration for R—the same version of R, installed in the same directories, installed with the same versions of any packages required, and running on the same operating system. The examples in this section have been tested on a cluster of three computers running Ubuntu 14.04—one master node and two worker nodes.

The master and worker nodes should be on the same network and able to communicate with each other via SSH...