In what follows, we are going to learn more about partition clustering with k-means while exploring a dataset from the cluster.datasets
package. This package contains datasets that were published in the book, Clustering algorithms, by Hartigan (1975), with examples of analyses. So let's start by installing this dataset on your machine, and loading it.
install.packages("cluster.datasets") library(cluster.datasets)
We will first focus on getting to know the data, scaling the data to a common metric, and cluster interpretability. Our first exploration will concern the crime rates among different US cities in 1970. The dataset all.us.city.crime.1970
affords such investigation:
data(all.us.city.crime.1970) crime = all.us.city.crime.1970
Let's investigate the attributes in the dataset:
ncol(crime) names(crime) summary(crime)
There are 10 attributes. A look at the R manual page (type ?all.us.city.crime.1970...