Book Image

Learning SciPy for Numerical and Scientific Computing Second Edition

Book Image

Learning SciPy for Numerical and Scientific Computing Second Edition

Overview of this book

Table of Contents (15 chapters)
Learning SciPy for Numerical and Scientific Computing Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Clustering


Another technique used in data mining is clustering. SciPy has two modules to deal with any problem in this field, each of them addressing a different clustering tool—scipy.cluster.vq for k-means and scipy.cluster.hierarchy for hierarchical clustering.

Vector quantization and k-means

We have two routines to divide data into clusters using the k-means technique—kmeans and kmeans2. They correspond to two different implementations. The former has a very simple syntax:

kmeans(obs, k_or_guess, iter=20, thresh=1e-05)

The obs parameter is an ndarray with the data we wish to cluster. If the dimensions of the array are m x n, the algorithm interprets this data as m points in the n-dimensional Euclidean space. If we know the number of clusters in which this data should be divided, we enter so with the k_or_guess option. The output is a tuple with two elements. The first is an ndarray of dimension k x n, representing a collection of points—as many as clusters were indicated. Each of these locations...