Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Extracting data into R versus processing data in a database


Most R programmers are familiar with and very comfortable manipulating data in R using R data structures and packages. This requires moving all the data into R whether in memory or on a disk, on a single computer or on a cluster. In some situations, this might not be efficient especially if the data constantly changes and needs to be updated often—extracting data out of a database or data warehouse every time it needs to be analyzed takes a lot of time and computational resources. In some cases, it might not be feasible at all to move terabytes or more of data from their sources into R.

Instead of moving the data into R, another approach is to move the computational tasks to the data. In other words, we can process the data in the database and retrieve only the results into R, which are usually much smaller than the raw data. This reduces the amount of network bandwidth required to transmit the data and the local storage and memory...