Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Clustering words by their lexemes


Words that look alike can easily be clustered together. The clustering algorithm in the lexeme-clustering package is based on Janicki's research paper titled, "A Lexeme-Clustering Algorithm for Unsupervised Learning of Morphology". A direct link to this paper can be found through the following URL: http://skil.informatik.uni-leipzig.de/blog/wp-content/uploads/proceedings/2012/Janicki2012.37.pdf.

Getting ready

An Internet connection is necessary for this recipe to download the package from GitHub.

How to do it…

Follow these steps to install and use the library:

  1. Obtain the lexeme-clustering library from GitHub. If Git is installed, enter the following command, otherwise download it from https://github.com/BinRoot/lexeme-clustering/archive/master.zip:

    $ git clone https://github.com/BinRoot/lexeme-clustering
    
  2. Change into the library's directory:

    $ cd lexeme-clustering/
    
  3. Install the package:

    $ cabal install
    
  4. Create an input file with a different word on each line:

    ...