Learning SciPy for Numerical and Scientific Computing Second Edition

Another technique used in data mining is clustering. SciPy has two modules to deal with any problem in this field, each of them addressing a different clustering tool—scipy.cluster.vq for k-means and scipy.cluster.hierarchy for hierarchical clustering.

Vector quantization and k-means

We have two routines to divide data into clusters using the k-means technique—kmeans and kmeans2. They correspond to two different implementations. The former has a very simple syntax:

kmeans(obs, k_or_guess, iter=20, thresh=1e-05)

The obs parameter is an ndarray with the data we wish to cluster. If the dimensions of the array are m x n, the algorithm interprets this data as m points in the n-dimensional Euclidean space. If we know the number of clusters in which this data should be divided, we enter so with the k_or_guess option. The output is a tuple with two elements. The first is an ndarray of dimension k x n, representing a collection of points—as many as clusters were indicated. Each of these locations...

Learning SciPy for Numerical and Scientific Computing Second Edition

Learning SciPy for Numerical and Scientific Computing Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Learning SciPy for Numerical and Scientific Computing Second Edition

Clustering

Vector quantization and k-means