Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using Directed Acyclic Word Graphs


We use Directed Acyclic Word Graphs (DAWG) to retrieve very quickly from a large corpus of strings at an extremely small cost in space complexity. Imagine compressing all words in a dictionary using a DAWG to perform efficient lookups for words. It is a powerful data structure that can come in handy when dealing with a large corpus of words. A very nice introduction to DAWGs can be found in Steve Hanov's blog post here: http://stevehanov.ca/blog/index.php?id=115.

We can use this recipe to incorporate a DAWG in our code.

Getting ready

Install the DAWG package using cabal:

$ cabal install dawg

How to do it...

We name a new file Main.hs and insert the following code:

  1. Import the following packages:

    import qualified Data.DAWG.Static as D
    import Network.HTTP ( simpleHTTP, getRequest,  
                          getResponseBody)
    import Data.Char (toLower, isAlphaNum, isSpace)
    import Data.Maybe (isJust)
  2. In main, download a large corpus of text to store:

    main = do
      let url...