-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Haskell Data Analysis cookbook
By :
Sometimes, we do not know the number of clusters in a dataset, yet most clustering algorithms require this information a priori. One way to find the number of clusters is to run the clustering algorithm on all possible number of clusters and compute the average variance of the clusters. We can then graph the average variance for the number of clusters, and identify the number of clusters by finding the first fluctuation of the curve.
Review the k-means recipe titled Implementing the k-means clustering algorithm. We will be using the kmeans and
assign functions defined in that recipe.
Install the Statistics package from cabal:
$ cabal install statistics
Create a new file and insert the following code. We name this file Main.hs.
variance function and the helper fromList function:import Statistics.Sample (variance) import Data.Vector.Unboxed (fromList)
avgVar points centroids...
Change the font size
Change margin width
Change background colour