Random forests can be set up without the target variable. Using this feature, we will calculate the proximity matrix and use the OOB proximity values. Since the proximity matrix gives us a measure of closeness between the observations, it can be converted into clusters using hierarchical clustering methods.
We begin with the setup of y = NULL
in the randomForest
function. The options of proximity=TRUE
and oob.prox=TRUE
are specified to ensure that we obtain the required proximity matrix:
>data(multishapes) >par(mfrow=c(1,2)) >plot(multishapes[1:2],col=multishapes[,3], + main="Six Multishapes Data Display") > MS_RF <- randomForest(x=multishapes[1:2],y=NULL,ntree=1000, + proximity=TRUE, oob.prox=TRUE,mtry = 1)
Next, we use the hclust
function with the option of ward.D2
to carry out the hierarchical cluster analysis on the proximity matrix of dissimilarities. The cutree
function divides the hclust
object into k = 6
number of clusters. Finally...