Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Weighted edit distance


Weighted edit distance is essentially a simple edit distance, except that the edits allow different costs to be associated with each kind of edit operation. The edit operations we identified in the previous recipe are substitution, insertion, deletion, and transposition. Additionally, there can be a cost associated with the exact matches to increase the weight for matching – this might be used when edits are required, such as a string-variation generator. Edit weights are generally scaled as log probabilities so that you can assign likelihood to an edit operation. The larger the weight, the more likely that edit operation is. As probabilities are between 0 and 1, log probabilities, or weights, will be between negative infinity and zero. For more on this refer to the Javadoc on the WeightedEditDistance class at http://alias-i.com/lingpipe/docs/api/com/aliasi/spell/WeightedEditDistance.html.

On the log scale, weighted edit distance can be generalized to produce exactly...