Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Preprocessing data in a relational database using SQL


We will start by learning how to run SQL statements in the database from R. The first few examples show how processing data in a database instead of moving all the data into R can result in faster performance even for simple operations.

To run the examples in this chapter, you will need a database server supported by R. The CRAN package, RJDBC provides an interface to JDBC drivers that most databases come with. Alternatively, search on CRAN for packages such as RPostgreSQL, RMySQL, and ROracle that offer functionalities and optimizations specific to each database.

The following examples are based on a PostgreSQL database and the RPostgreSQL package as we will need them later in this chapter when we learn about the PivotalR package and MADlib software. Feel free, however, to adapt the code to the database that you use.

Configuring PostgreSQL to work with R involves setting up both the server and the client. First, we need to set up the PostgreSQL...