In the previous chapters, we used a family of built-in functions such as read.csv
and read.table
to import data from separator-delimited files, such as those in the csv format. Using text formats to store data is handy and portable. When the data file is large, however, such a storage method may not be the best way.
There are three main reasons why text formats can no longer be easy to use. They are as follows:
Functions such as
read.csv()
are mostly used to load the whole file into memory, that is, a data frame in R. If the data is too large to fit into the computer memory, we simply cannot do it.Even if the dataset is large, we usually don't have to load the whole dataset into memory when we work on a task. Instead, we often need to extract a subset of the dataset that meets a certain condition. The built-in data-importer functions simply do not support querying a csv file.
The dataset is still updating, that is, we need to insert records into the dataset...