Fixing spelling mistakes
When gathering human-provided data, spelling mistakes may sneak in. This recipe will correct a misspelled word using Peter Norvig's simple heuristic spellchecker described at http://norvig.com/spell-correct.html.
This recipe is just one approach to a very difficult problem in machine learning. We can use it as a starting point or as an influence to implement a more powerful solution with better results.
Getting ready
Refer to Norvig's spell-correction Python algorithm located at http://norvig.com/spell-correct.html.
The core algorithm works as follows:
Transform raw text into lowercase alphabetical words
Compute a frequency map of all the words
Define functions to produce all strings within an edit distance of one or two
Find all possible candidates of a misspelling by looking up valid words within this edit distance of one or two
Finally, pick out the candidate with the highest frequency of occurrence in the trained corpus
The Haskell algorithm below mimics this Python code...