Book Image

Haskell Data Analysis cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis cookbook

By: Nishant Shukla

Overview of this book

Step-by-step recipes filled with practical code samples and engaging examples demonstrate Haskell in practice, and then the concepts behind the code. This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.
Table of Contents (14 chapters)
13
Index

Finding the number of clusters

Sometimes, we do not know the number of clusters in a dataset, yet most clustering algorithms require this information a priori. One way to find the number of clusters is to run the clustering algorithm on all possible number of clusters and compute the average variance of the clusters. We can then graph the average variance for the number of clusters, and identify the number of clusters by finding the first fluctuation of the curve.

Getting ready

Review the k-means recipe titled Implementing the k-means clustering algorithm. We will be using the kmeans and assign functions defined in that recipe.

Install the Statistics package from cabal:

$ cabal install statistics

How to do it…

Create a new file and insert the following code. We name this file Main.hs.

  1. Import the variance function and the helper fromList function:
    import Statistics.Sample (variance)
    import Data.Vector.Unboxed (fromList)
  2. Compute the average of the variance of each cluster:
    avgVar points centroids...