Another way in which R is CPU limited is that, by default, it runs only on a single thread on the CPU. It does not matter if you install R on a powerful server with 64 CPU cores, R will only use one of them. For example, finding the sum of a numeric vector is an operation that can be made to run in parallel in the CPU quite easily. If there are four CPU cores available, each core can be given roughly one quarter of the data to process. Each core computes the subtotal of the chunk of data it is given, and the four subtotals are then added up to find the total sum of the whole dataset. However in R, the sum()
function runs serially, processing the entire dataset on one CPU core. In fact, many Big Data operations are of a similar nature to the summation example here, with the same task running independently on many subsets of data. In such a scenario, performing the operation sequentially would be an underuse of today's mostly parallel computing architectures. In Chapter 8, Multiplying Performance with Parallel Computing, we will learn how to write parallel programs in R to overcome this limitation.
R High Performance Programming
R High Performance Programming
Overview of this book
Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Understanding R's Performance – Why Are R Programs Sometimes Slow?
Profiling – Measuring Code's Performance
Simple Tweaks to Make R Run Faster
Using Compiled Code for Greater Speed
Using GPUs to Run R Even Faster
Simple Tweaks to Use Less RAM
Processing Large Datasets with Limited RAM
Multiplying Performance with Parallel Computing
Offloading Data Processing to Database Systems
R and Big Data
Index
Customer Reviews