10.4 CLUSTER VALIDATION
Cluster solutions should be validated. Since no predictions were made using the training data set, we simply reapply the k‐means algorithm, this time to the white_wine_test data set, and compare the results obtained with the training set. Table 10.2 contains the resulting mean variable values, by cluster. As shown in Table 10.3, the difference in mean values (training minus test sets) is relatively small. Analysts wishing further validation may perform two‐sample t‐tests here.
TABLE 10.2 Mean variable value, by cluster, for the white_wine_test data sets
Variable | Cluster 1 : 638 wines “Sweet Wines” |
Cluster 2 : 1122 wines “Dry Wines” |
Sugar _z | 1.07 | −0.61 |
Alcohol _z | −0.80 | 0.46 |
The Python results in Figure 10.3 are used for this table. The cluster labels “Cluster 1” and “Cluster 2” were reversed, for ease of interpretation...