## Clustering

Another technique used in data mining is clustering. SciPy has two modules to deal with any problem in this field, each of them addressing a different clustering tool—`scipy.cluster.vq`

for k-means and `scipy.cluster.hierarchy`

for hierarchical clustering.

### Vector quantization and k-means

We have two routines to divide data into clusters using the k-means technique—`kmeans`

and `kmeans2`

. They correspond to two different implementations. The former has a very simple syntax:

kmeans(obs, k_or_guess, iter=20, thresh=1e-05)

The `obs`

parameter is an `ndarray`

with the data we wish to cluster. If the dimensions of the array are *m* x *n*, the algorithm interprets this data as *m* points in the n-dimensional Euclidean space. If we know the number of clusters in which this data should be divided, we enter so with the `k_or_guess`

option. The output is a tuple with two elements. The first is an `ndarray`

of dimension *k* x *n*, representing a collection of points—as many as clusters were indicated. Each of these locations...