It is always convenient to be able to measure some form of distance between two points. In previous chapters, we used the distance between points to aid in clustering and classification. We can do the same for words and passages in NLP. The problem, of course, is that words are made up of letters, and distances are made up of numbers—so how do we make a number out of two words?
Enter Levenshtein distance—a simple metric that measures the number of single-character edits it would take to transform one string into the other. The Levenshtein distance allows insertions, deletions, and substitutions. A modification of the Levenshtein distance, called the Damerau-Levenshtein distance, also allows transpositions, or the swapping of two neighboring letters.
To illustrate this concept with an example, let's try transforming the word crate into the word...