10.3 AN APPLICATION OF k‐MEANS CLUSTERING
We apply the k‐means clustering algorithm to the white_wine_training and the white_wine_test data sets. These data sets are adapted from the Wine Quality data set at UCI.3The data consist of chemical and quality characteristics of a collection of Portuguese white wines. The predictors are alcohol and sugar. The target variable is quality, a measure of how good the wine is, according to a professional taster. When constructing clusters, it is important to not include the target variable as an input to the clustering algorithm. Doing so would bias the results if we later use the clusters to predict the target. It is also important to standardize or normalize all the predictors, so that the greater variability of one predictor does not dominate the cluster construction process.
Now, the k‐means algorithm requires the analyst to specify the desired number of clusters. For simplicity, we specify k = 2 clusters and proceed to apply...