Usually it's necessary to create some new transformation based on existing variables which will improve a prediction. We have already seen that binning a variable is often done to create a nominal variable from a quantitative one.
Let's create a new column, called agecat, which divides age into two segments. To keep things simple, we will start off by rounding the age to the nearest integer.
filtered <- SparkR::filter(out_sd, "age > 0 AND insulin > 0") filtered$age <- round(filtered$age,0) filtered$agecat <- ifelse(filtered$age <= 35,"<= 35","35 Or Older") SparkR::head(SparkR::select(filtered, "age","agecat"))
In the code which you just ran, you may notice that some commands are prefaced by SparkR::
This is done to let the program know which version of the function we wish to apply, and it is always good practice to preface commands in this way, in order to avoid syntax errors and misapplying identically named functions which occur between SparkR...