Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Computing the edit distance


The edit distance or Levenshtein distance is the minimum number of simple string operations required to convert one string into another. In this recipe, we will compute the edit distance based on only insertions, deletions, and substitutions of characters.

Getting ready

Review the equation shown in the following figure obtained from the Wikipedia article about the Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance):

Here, a and b are the two strings, and i and j are numbers representing their lengths.

The Haskell code will be a direct translation of this mathematical formula.

Also, install the data-memocombinators package from Cabal. This allows us to minimize redundant computations to improve runtime.

$ cabal install data-memocombinators

How to do it...

  1. The only import we will need is the ability to easily memoize functions using the following line of code:

    import qualified Data.MemoCombinators as Memo
  2. Define the Levenshtein distance function exactly...