Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Fixing spelling mistakes


When gathering human-provided data, spelling mistakes may sneak in. This recipe will correct a misspelled word using Peter Norvig's simple heuristic spellchecker described at http://norvig.com/spell-correct.html.

This recipe is just one approach to a very difficult problem in machine learning. We can use it as a starting point or as an influence to implement a more powerful solution with better results.

Getting ready

Refer to Norvig's spell-correction Python algorithm located at http://norvig.com/spell-correct.html.

The core algorithm works as follows:

  • Transform raw text into lowercase alphabetical words

  • Compute a frequency map of all the words

  • Define functions to produce all strings within an edit distance of one or two

  • Find all possible candidates of a misspelling by looking up valid words within this edit distance of one or two

  • Finally, pick out the candidate with the highest frequency of occurrence in the trained corpus

The Haskell algorithm below mimics this Python code...