We are once again going to visit our wine data set that we used in Chapter 8, Cluster Analysis. If you recall, it consists of 13 numeric features and a response of three possible classes of wine. Our task is to predict those classes. I will include one interesting twist and that is to artificially increase the number of observations. The reasons are twofold. First, I want to fully demonstrate the resampling capabilities of the mlr
package, and second, I wish to cover a synthetic sampling technique. We utilized upsampling in the prior section, so synthetic is in order.
Our first task is to load the package libraries and bring the data:
> library(mlr) > library(ggplot2) > library(HDclassif) > library(DMwR) > library(reshape2) > library(corrplot) > data(wine) > table(wine$class) 1 2 3 59 71 48
We have 178 observations, plus the response labels are numeric (1, 2 and 3). Let's more than double...