10.5 HOW TO PERFORM k‐MEANS CLUSTERING USING PYTHON
Load the required packages.
import pandas as pd
from scipy import stats
from sklearn.cluster import KMeans
Read in the white_wine_training data set as wine_train.
wine_train = pd.read_csv("C:/.../white_wine_training")
For simplicity, let us isolate the predictor variables and save them as X.
X = wine_train[['alcohol', 'sugar']]
Once we have our predictor variables, standardize them using the z‐score transformation and save the result as a data frame.
Xz = pd.DataFrame(stats.zscore(X), columns=['alcohol', 'sugar'])
As in Chapter 3, the stats.zscore command will convert the variables in X into their z‐scores. We save the new standardized variables as a data frame using the DataFrame() command. The optional input columns use the given names as the column names. We save the result as Xz.
Now, we run k‐means clustering on the training data set.
kmeans01...