In this chapter, we will cover the following recipes:
Distance and proximity – simple edit distance
Weighted edit distance
The Jaccard distance
The Tf-Idf distance
Using edit distance and language models for spelling correction
The case restoring corrector
Automatic phrase completion
Single-link and complete-link clustering using edit distance
Latent Dirichlet allocation (LDA) for multitopic clustering