Next, we will generate test and training datasets so that we can validate any models produced. There are many ways of generating test and training sets.
In earlier chapters, we used the createDataPartition
function. For this example, we will generate the test and training data using native R functions. Please refer to the outline of the code here, and then run the code that follows:
- Set a variable corresponding to the percentage of the data to designate as training data (
TrainingRows
). In this example, we will use 75%. - Use the
sample()
function to randomize the rows and assign to a new dataframe namedChurnStudy
. - Then select the first
TrainingRows
rows. Since thedf
dataframe has already been sampled, selecting a percentage of rows sequentially from a random sample is a convenient and valid way to select a training sample. - The remaining rows (
TrainingRows+1
to the end) will be the testing dataset. Assign it to ChurnStudy.test.
Once we have generated the...