-
Book Overview & Buying
-
Table Of Contents
Practical Machine Learning with R
By :
Solution:
library(cluster)
library(factoextra)
df <- read.csv("mtcars.csv")
rownames(df) <- df$X
df$X <- NULL
The row names (states) become a column, X, when you save it as a CSV file. So, we need to change it back, as the row names are used in the plot in step 7.
df <- na.omit(df)
df <- scale(df)
dv <- diana(df,metric = "manhattan", stand = TRUE)
plot(dv)
The output is as follows:

The next plot is as follows:

agn <- agnes(df)
pltree(agn)
The output is as follows:

fviz_nbclust(mtcars, kmeans, method = "wss") +
geom_vline(xintercept = 4, linetype = 2) +
labs(subtitle = "Elbow method")
The output is as follows:

k4 <- kmeans(df, centers = 4, nstart = 20)
fviz_cluster(k4, data = df)
The output is as follows:

If we consider cutting the DIANA tree at height 20, the Ferrari is clustered together with the Ford and the Maserati (the smallest cluster):
Meanwhile, cutting the AGNES dendrogram at height 4 results in the Ferrari being clustered with the Mazda RX4, the Mazda RX4 Wag, and the Porsche. k-means clusters the Ferrari with the Mazdas, the Ford, and the Maserati.
Clearly, the choice of clustering technique and algorithms results in different clusters being created. It is important to apply some domain knowledge to determine the most valuable end results.