Many modern software applications are designed to run computations in parallel in order to take advantage of the multiple CPU cores available on almost any computer today. Many R programs can similarly be written in order to run in parallel. However, the extent of possible parallelism depends on the computing task involved. On one side of the scale are embarrassingly parallel tasks, where there are no dependencies between the parallel subtasks; such tasks can be made to run in parallel very easily. An example of this is, building an ensemble of decision trees in a random forest algorithm—randomized decision trees can be built independently from one another and in parallel across tens or hundreds of CPUs, and can be combined to form the random forest. On the other end of the scale are tasks that cannot be parallelized, as each step of the task depends on the results of the previous step. One such example is a depth-first search of a tree, where the...
R High Performance Programming
R High Performance Programming
Overview of this book
Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Understanding R's Performance – Why Are R Programs Sometimes Slow?
Profiling – Measuring Code's Performance
Simple Tweaks to Make R Run Faster
Using Compiled Code for Greater Speed
Using GPUs to Run R Even Faster
Simple Tweaks to Use Less RAM
Processing Large Datasets with Limited RAM
Multiplying Performance with Parallel Computing
Offloading Data Processing to Database Systems
R and Big Data
Index
Customer Reviews