The first step in any data analysis is preparing the data for the analysis. The rest of this chapter will mostly deal with this topic, but here we will review some basic considerations and R techniques. The most important part of any data analysis is to know the dataset and to have some idea of how each of the variables in the dataset was created.
For a basic overview, we will use the pumpkin dataset, which is short and artificial. Have a look at all of the following data in it:
pumpkins <- read.csv('messy_pumpkins.txt', stringsAsFactors = FALSE) > pumpkins weight location 1 2.3 europe 2 2.4kg Europee 3 3.1 kg USA 4 2700 grams United States 5 24 U.S.