Before starting to explore data, you must load the data into the R session. This recipe will introduce methods to load data either from a file into the memory or use the predefined data within R.
First, start an R session on your machine. As this recipe involves steps toward the file IO, if the user does not specify the full path, read and write activity will take place in the current working directory.
You can simply type getwd()
in the R session to obtain the current working directory location. However, if you would like to change the current working directory, you can use setwd("<path>")
, where <path>
can be replaced as your desired path, to specify the working directory.
Perform the following steps to read and write data with R:
To view the built-in datasets of R, type the following command:
> data()
R will return a list of datasets in a
dataset
package, and the list comprises the name and description of each dataset.To load the dataset
iris
into an R session, type the following command:> data(iris)
The dataset iris is now loaded into the data frame format, which is a common data structure in R to store a data table.
To view the data type of iris, simply use the
class
function:> class(iris) [1] "data.frame"
The
data.frame
console print shows that theiris
dataset is in the structure of data frame.Use the save function to store an object in a file. For example, to save the loaded iris data into
myData.RData
, use the following command:> save(iris, file="myData.RData")
Use the load function to read a saved object into an R session. For example, to load iris data from
myData.RData
, use the following command:> load("myData.RData")
In addition to using built-in datasets, R also provides a function to import data from text into a data frame. For example, the
read.table
function can format a given text into a data frame:> test.data = read.table(header = TRUE, text = " + a b + 1 2 + 3 4 + ")
You can also use
row.names
andcol.names
to specify the names of columns and rows:> test.data = read.table(text = " + 1 2 + 3 4", + col.names=c("a","b"), + row.names = c("first","second"))
View the class of the
test.data
variable:> class(test.data) [1] "data.frame"
The
class
function shows that thetest.data
variable contains a data frame.In addition to importing data by using the
read.table
function, you can use thewrite.table
function to export data to a text file:> write.table(test.data, file = "test.txt" , sep = " ")
The
write.table
function will write the content oftest.data
intotest.txt
(the written path can be found by typinggetwd()
), with a separation delimiter as white space.Similar to
write.table
,write.csv
can also export data to a file. However,write.csv
uses a comma as the default delimiter:> write.csv(test.data, file = "test.csv")
With the
read.csv
function, thecsv
file can be imported as a data frame. However, the last example writes column and row names of the data frame to thetest.csv
file. Therefore, specifying header toTRUE
and row names as the first column within the function can ensure the read data frame will not treat the header and the first column as values:> csv.data = read.csv("test.csv", header = TRUE, row.names=1) > head(csv.data) a b 1 1 2 2 3 4
Generally, data for collection may be in multiple files and different formats. To exchange data between files and RData, R provides many built-in functions, such as save
, load
, read.csv
, read.table
, write.csv
, and write.table
.
This example first demonstrates how to load the built-in dataset iris into an R session. The iris dataset is the most famous and commonly used dataset in the field of machine learning. Here, we use the iris dataset as an example. The recipe shows how to save RData and load it with the save
and load
functions. Furthermore, the example explains how to use read.table
, write.table
, read.csv
, and write.csv
to exchange data from files to a data frame. The use of the R IO function to read and write data is very important as most of the data sources are external. Therefore, you have to use these functions to load data into an R session.
For the load
, read.table
, and read.csv
functions, the file to be read can also be a complete URL (for supported URLs, use ?url
for more information).
On some occasions, data may be in an Excel file instead of a flat text file. The WriteXLS
package allows writing an object into an Excel file with a given variable in the first argument and the file to be written in the second argument:
Install the
WriteXLS
package:> install.packages("WriteXLS")
Load the
WriteXLS
package:> library("WriteXLS")
Use the
WriteXLS
function to write the data frame iris into a file namediris.xls
:> WriteXLS("iris", ExcelFileName="iris.xls")