Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Using edit distance and language models for spelling correction


Spelling correction takes a user input text and provides a corrected form. Most of us are familiar with automatic spelling correction via our smart phones or editors such as Microsoft Word. There are obviously quite a few amusing examples of these on the Web where the spelling correction fails. In this example, we'll build our own spelling-correction engine and look at how to tune it.

LingPipe's spelling correction is based on a noisy-channel model which models user mistakes and expected user input (based on the data). Expected user input is modeled by a character-language model, and mistakes (or noise) is modeled by weighted edit distance. The spelling correction is done using the CompiledSpellChecker class. This class implements the noisy-channel model and provides an estimate of the most likely message, given that the message actually received. We can express this through a formula in the following manner:

didYouMean(received...