In the first chapter we introduced you to a number of general terms and concepts related to Big Data. In Chapter 2, Introduction to R Programming Language and Statistical Environment, we presented you with several frequently used methods for data management, processing, and analysis using the R language and its statistical environment. In this chapter we will merge both topics and attempt to explain how you can use powerful mathematical and data modeling R packages in large datasets, without the need for distributed computing. After reading this chapter you should be able to:
Understand R's traditional limitations for Big Data analytics and how they can be resolved
Use R packages such as
ff
,ffbase
,ffbase2
, andbigmemory
to enhance out-of-memory performanceApply statistical methods to large R objects through the
biglm
andffbase
packagesEnhance the speed of data processing with R libraries supporting parallel computing
Benefit from faster data...