Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Implementing data parallel algorithms


Several R packages allow code to be executed in parallel. The parallel package that comes with R provides the foundation for most parallel computing capabilities in other packages. Let's see how it works with an example.

This example involves finding documents that match a regular expression. Regular expression matching is a fairly computational expensive task, depending on the complexity of the regular expression. The corpus, or set of documents, for this example is a sample of the Reuters-21578 dataset for the topic corporate acquisitions (acq) from the tm package. Because this dataset contains only 50 documents, they are replicated 100,000 times to form a corpus of 5 million documents so that parallelizing the code will lead to meaningful savings in execution times.

library(tm)
data("acq")
textdata <- rep(sapply(content(acq), content), 1e5)

The task is to find documents that match the regular expression \d+(,\d+)? mln dlrs, which represents monetary...