EXERCISES
CLARIFYING THE CONCEPTS
- Explain what clustering is trying to accomplish, using the concepts of between‐cluster variation and within‐cluster variation.
- Which, records or variables, does clustering seek to group?
- Why is it helpful to apply clustering fairly early in the modeling process?
- True or false: k‐means clustering automatically selects the optimal number of clusters.
- Why do we omit the target variable as an input to the clustering algorithm?
- Explain how we proceed to perform cluster validation.
- Why do we standardize the numerical predictors prior to clustering?
- What is perhaps the most important cluster validation method?
- What is the centroid of the points (1, 5), (2, 4), and (3, 3)?
- Provide an example of clustering in the everyday world that is not discussed in this chapter.
WORKING WITH THE DATA
For the following exercises, work with the white_wine_training and white_wine_test data sets. Use either Python or R to solve each problem.
- Input and...