Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 6. String Comparison and Clustering

In this chapter, we will cover the following recipes:

  • Distance and proximity – simple edit distance

  • Weighted edit distance

  • The Jaccard distance

  • The Tf-Idf distance

  • Using edit distance and language models for spelling correction

  • The case restoring corrector

  • Automatic phrase completion

  • Single-link and complete-link clustering using edit distance

  • Latent Dirichlet allocation (LDA) for multitopic clustering