Another technique used in data mining is clustering. SciPy has two modules to deal with any problem in this field, each of them addressing a different clustering tool—scipy.cluster.vq
for k-means and scipy.cluster.hierarchy
for hierarchical clustering.
We have two routines to divide data into clusters using the k-means technique—kmeans
and kmeans2
. They correspond to two different implementations. The former has a very simple syntax:
kmeans(obs, k_or_guess, iter=20, thresh=1e-05)
The obs
parameter is an ndarray
with the data we wish to cluster. If the dimensions of the array are m x n, the algorithm interprets this data as m points in the n-dimensional Euclidean space. If we know the number of clusters in which this data should be divided, we enter so with the k_or_guess
option. The output is a tuple with two elements. The first is an ndarray
of dimension k x n, representing a collection of points—as many as clusters were indicated. Each of these locations...