Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Computing the Jaro-Winkler distance between two strings


The Jaro-Winkler distance measures string similarity represented as a real number between 0 and 1. The value 0 corresponds to no similarity, and 1 corresponds to an identical match.

Getting ready

The algorithm behind the function comes from the following mathematical formula presented in the Wikipedia article about the Jaro-Winkler distance http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance:

In the preceding formula, the following are the representations of the variables used:

  • s1 is the first string.

  • s2 is the second string.

  • m is the number of identical characters within a distance of at the most half the length of the longer string. These are called matching characters.

  • t is half the number of matching characters that are not in the same index. In other words, it is half the number of transpositions.

How to do it...

  1. We will need access to the elemIndices function, which is imported as follows:

    import Data.List (elemIndices)
  2. Define...