Haskell Data Analysis Cookbook

By : Nishant Shukla
Haskell Data Analysis Cookbook

By: Nishant Shukla

Haskell Data Analysis Cookbook
Using Directed Acyclic Word Graphs

We use Directed Acyclic Word Graphs (DAWG) to retrieve very quickly from a large corpus of strings at an extremely small cost in space complexity. Imagine compressing all words in a dictionary using a DAWG to perform efficient lookups for words. It is a powerful data structure that can come in handy when dealing with a large corpus of words. A very nice introduction to DAWGs can be found in Steve Hanov's blog post here:

We can use this recipe to incorporate a DAWG in our code.

Getting ready

Install the DAWG package using cabal:

$ cabal install dawg

How to do it...

We name a new file Main.hs and insert the following code:

  1. Import the following packages:

    import qualified Data.DAWG.Static as D
    import Network.HTTP ( simpleHTTP, getRequest,  
    import Data.Char (toLower, isAlphaNum, isSpace)
    import Data.Maybe (isJust)
  2. In main, download a large corpus of text to store:

    main = do
      let url...