Now we can cluster the term document matrix using k-means. For illustration purposes, we will specify that five clusters be generated:
kmeans5 <- kmeans(dtms, 5)
Once k-means is done, we will append the cluster number to the original data, and then create five subsets based upon the cluster:
kw_with_cluster <- as.data.frame(cbind(OnlineRetail, Cluster = kmeans5$cluster)) # subset the five clusters cluster1 <- subset(kw_with_cluster, subset = Cluster == 1) cluster2 <- subset(kw_with_cluster, subset = Cluster == 2) cluster3 <- subset(kw_with_cluster, subset = Cluster == 3) cluster4 <- subset(kw_with_cluster, subset = Cluster == 4) cluster5 <- subset(kw_with_cluster, subset = Cluster == 5)