This first coding stage must include all the processes that are completely independent from the application. Although they can be automatically scheduled eventually (for example, if the data source changes over time and has to be refreshed), we can think of processes that need to be done just once whenever the data source changes.
In our example, we will include the elimination of variables and the recoding. After this process, the processed data sources have to be saved, of course. In the following piece of code, we will load the dataset in the same way as we did before and eliminate the corresponding columns:
#Retrieve Data data.adult <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header = F) names(data.adult) <- c("age", "workclass", "fnlwgt", "education", "education.num", "marital.status", "occupation", "relationship", "race", "sex", "capital.gain", "capital.loss", "hours.per.week", "native.country","earnings...