Chapter 9. Offloading Data Processing to Database Systems
We have learned many different ways to optimize the performance of an R code for speed and memory efficiency. But sometimes R alone is not enough. Perhaps, a very large dataset is stored in a data warehouse. It would be infeasible to extract all the data into R for processing. We might even wish to tap into the power of specially-designed analytical databases that can perform computations on data much more efficiently than R can. In this chapter, we will learn how to tap into the power of external database systems from within R and combine that power with the flexibility and ease of use of the R language.
This chapter covers the following:
Extracting data into R versus processing data in a database
Preprocessing data in a relational database using SQL
Converting R expressions into SQL
Running statistical and machine learning algorithms in a database
Using columnar databases for improved performance
Using array databases for maximum scientific...