Another common way to cluster data is hierarchically. This involves either splitting the dataset down to pairs or building the clusters up by pairing the data or clusters that are closest to each other.
Weka has a class—HierarchicalClusterer
—for performing hierarchical clustering. We'll use the defanalysis
macro that we created in the Discovering groups of data using K-means clustering recipe to create a wrapper function for this analysis also.
We'll use the same project.clj
dependencies that we did in the Loading CSV and ARFF data into Weka recipe. And we'll use the following set of imports:
(import [weka.core EuclideanDistance] [weka.clusterers HierarchicalClusterer]) (require '[clojure.string :as str])
Because hierarchical clustering can be memory-intensive, we'll use the Iris dataset, which is fairly small. The easiest way to get this dataset is to download it from http://www.ericrochester.com/clj-data-analysis/data/UCI/iris.arff...