10.6 HOW TO PERFORM k‐MEANS CLUSTERING USING R
Read in the white_wine_training data set as wine_train and subset the predictor variables into their own matrix.
X <‐ subset(wine_train, select = c("alcohol", "sugar"))
The subset() command will select the two variables named alcohol and sugar from the wine_train data set, and store them under their own name, X.
Now, we standardize both predictor variables and save the output as a data frame. Data frame format is required for running the kmeans() command.
Xs <‐ as.data.frame(scale(X))
colnames(Xs) <‐ c("alcohol_z", "sugar_z")
The scale() command turns the variables in X into their respective z‐scores, while as.data.frame saves the result as a data frame. The result is saved as Xs. We edit the column names using colnames() to emphasize that the variables are now standardized.
The kmeans() command is included in the base installation of R. However, if...