## Chapter 2: Advanced Clustering Methods

### Activity 5: Implementing k-modes Clustering on the Mushroom Dataset

Solution:

Download

**mushrooms.csv**from https://github.com/TrainingByPackt/Applied-Unsupervised-Learning-with-R/blob/master/Lesson02/Activity05/mushrooms.csv.After downloading, load the

**mushrooms.csv**file in R:ms<-read.csv('mushrooms.csv')

Check the dimensions of the dataset:

dim(ms)

The output is as follows:

[1] 8124 23

Check the distribution of all columns:

summary.data.frame(ms)

The output is as follows:

Each column contains all the unique labels and their count.

Store all the columns of the dataset, except for the final label, in a new variable,

**ms_k**:ms_k<-ms[,2:23]

Import the

**klaR**library, which has the**kmodes**function:install.packages('klaR') library(klaR)

Calculate

**kmodes**clusters and store them in a**kmodes_ms**variable. Enter the dataset without**true**labels as the first parameter and enter the number of clusters as the second parameter:kmodes_ms<-kmodes(ms_k,2)

Check the results by creating a table of

**true**labels and**cluster**labels:result = table(ms$class, kmodes_ms$cluster) result

The output is as follows:

1 2 e 80 4128 p 3052 864

As you can see, most of the edible mushrooms are in cluster 2 and most of the poisonous mushrooms are in cluster 1. So, using k-modes clustering has done a reasonable job of identifying whether each mushroom is edible or poisonous.

### Activity 6: Implementing DBSCAN and Visualizing the Results

Solution:

Import the

**dbscan**and**factoextra**library:library(dbscan) library(factoextra)

Import the

**multishapes**dataset:data(multishapes)

Put the columns of the

**multishapes**dataset in the**ms**variable:ms<-multishapes[,1:2]

Plot the dataset as follows:

plot(ms)

The output is as follows:

Perform k-means clustering on the dataset and plot the results:

km.res<-kmeans(ms,4) fviz_cluster(km.res, ms,ellipse = FALSE)

The output is as follows:

Perform DBSCAN on the

**ms**variable and plot the results:db.res<-dbscan(ms,eps = .15) fviz_cluster(db.res, ms,ellipse = FALSE,geom = 'point')

The output is as follows:

Here, you can see all the points in black are anomalies and are not present in any cluster, and the clusters formed in DBSCAN are not possible with any other type of clustering method. These clusters have taken all types of shapes and sizes, whereas in k-means, all clusters are of a spherical shape.

### Activity 7: Performing a Hierarchical Cluster Analysis on the Seeds Dataset

Solution:

Read the downloaded file into the

**sd**variable:sd<-read.delim('seeds_dataset.txt')

### Note

Make changes to the path as per the location of the file on your system.

First, put all the columns of the dataset other than final labels into the

**sd_c**variable:sd_c<-sd[,1:7]

Import the

**cluster**library:library(cluster)

Calculate the hierarchical clusters and plot the dendrogram:

h.res<-hclust(dist(sd_c),"ave") plot(h.res)

The output is as follows:

Cut the tree at

**k=3**and plot a table to see how the results of the clustering have performed at classifying the three types of seeds:memb <- cutree(h.res, k = 3) results<-table(sd$X1,memb) results

The output is as follows:

Perform divisive clustering on the

**sd_c**dataset and plot the dendrogram:d.res<-diana(sd_c,metric ="euclidean",) plot(d.res)

The output is as follows:

Cut the tree at

**k=3**and plot a table to see how the results of the clustering have performed at classifying the three types of seeds:memb <- cutree(h.res, k = 3) results<-table(sd$X1,memb) results

The output is as follows:

You can see that both types of clustering methods have produced identical results. These results also demonstrate that divisive clustering is the reverse of hierarchical clustering.