Now that we understand the data, we can continue with this particular statistical example.
First, the data scientist will need to load the data into an R data frame object. This example is calling it german_raw.
# --- load the data german_raw<- read.table("german.data", quote = "\"")
The next step is to provide column names that match our data schema table, shown in the preceding:
names(german_raw) <- c("checking", "duration", "creditHistory", "purpose", "credit", "savings", "employment", "installmentRate", "personal", "debtors", "presentResidence", "property", "age", "otherPlans", "housing", "existingBankCredits", "job", "dependents", "telephone", "foreign", "risk")
Note from the data schema (the table describing the features in the data) that we have a lot of categorical features to deal with. For this reason, a data scientist could employ the R dummyVars()
function (which can be used to create a full set of dummy variables...