Using hierarchical model to cluster your data
The hierarchical clustering model aims at building a hierarchy of clusters. Conceptually, you might think of it as a decision tree of clusters: based on the similarity (or dissimilarity) between clusters, they are aggregated (or divided) into more general (more specific) clusters. The agglomerative approach is often referred to as bottom up, while the divisive is called top down.
Getting ready
To execute this recipe, you will need pandas
, SciPy
, and PyLab
. No other prerequisites are required.
How to do it…
Hierarchical clustering can be extremely slow for big datasets as the complexity of the agglomerative algorithm is O(n3)
. To estimate our model, we use a single-linkage algorithm that has better complexity, O(n2)
, but can still be very slow for large datasets (the clustering_hierarchical.py
file):
def findClusters_link(data): ''' Cluster data using single linkage hierarchical clustering ''' # return the linkage object...