We noticed earlier that a dataset can be stored in any format using different software as well as relational databases. Usually, large-scale datasets are stored in database software. In data mining and statistical learning, we need to process large-scale datasets. One of the major problems in R is memory usage. R is RAM intensive, and for that reason, the size of a dataset should be much smaller than its RAM. Also, one of the major drawbacks of R is its inability to deal with large datasets.
This chapter introduces how to deal with large datasets that are bigger than the computer's memory and dealing with a dataset by interacting with database software. In the first few sections, we describe how to interact with database software with Open Database Connectivity (ODBC) and import datasets. This chapter will present an example of memory issues and then describe ODBC using an example of MS Excel and MS Access, dealing with large datasets with specialized contributed...