Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Searching a string using the Rabin-Karp algorithm


The Rabin-Karp algorithm finds a pattern in a body of text by matching a unique representation of the pattern against a moving window. The unique representation, or hash, is computed by considering a string as a number written in an arbitrary base of 26 or greater.

The advantage of Rabin-Karp is in searching for many needles in a haystack. It's not very efficient to search for just a single string. After the initial preprocessing of the corpus, the algorithm can quickly find matches.

Getting ready

Install the Data.ByteString.Search library from Cabal as follows:

$ cabal install stringsearch

How to do it...

  1. Use the OverloadedStrings language extension to facilitate the ByteString manipulations in our code as follows. It essentially allows polymorphic behavior for strings so that the GHC compiler may infer it as a ByteString type when necessary:

    {-# LANGUAGE OverloadedStrings #-}
  2. Import the Rabin-Karp algorithms as follows:

    import Data.ByteString...