When gathering human-provided data, spelling mistakes may sneak in. This recipe will correct a misspelled word using Peter Norvig's simple heuristic spellchecker described at http://norvig.com/spell-correct.html.
This recipe is just one approach to a very difficult problem in machine learning. We can use it as a starting point or as an influence to implement a more powerful solution with better results.
Refer to Norvig's spell-correction Python algorithm located at http://norvig.com/spell-correct.html.
The core algorithm works as follows:
Transform raw text into lowercase alphabetical words
Compute a frequency map of all the words
Define functions to produce all strings within an edit distance of one or two
Find all possible candidates of a misspelling by looking up valid words within this edit distance of one or two
Finally, pick out the candidate with the highest frequency of occurrence in the trained corpus