An alternative approach to PCA is k-means (unsupervised) clustering, which partitions the data into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. We can perform k-means clustering with the kmeans()
function and plot the results with plot3d()
as follows:
> set.seed(44) > cl <- kmeans(fish.data[,1:3],5) > fish.data$cluster <- as.factor(cl$cluster) > plot3d(fish.log.pca$x[,1:3], col=fish.data$cluster, main="k-means clusters")
Note
The color scheme used for the groups is different from the 3D plot of the PCA results. However, the overall distribution of the groups is similar.
Let's now evaluate how well it categorizes the data with a table as follows:
> with(fish.data, table(cluster, fish)) fish cluster Bluegill Bowfin Carp Goldeye Largemouth_Bass 1 0 0 14 39 18 2 0 27 12 0 22 3 0 23 13 ...