Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Single-link and complete-link clustering using edit distance


Clustering is the process of grouping a collection of objects by their similarities, that is, using some sort of distance measure. The idea behind clustering is that objects within a cluster are located close to each other, but objects in different clusters are farther away from each other. We can divide clustering techniques very broadly into hierarchical (or agglomerative) and divisional techniques. Hierarchical techniques start by assuming that every object is its own cluster and merge clusters together until a stopping criterion has been met.

For example, a stopping criterion can be a fixed distance between every cluster. Divisional techniques go the other way and start by grouping all the objects into one cluster and split it until a stopping criterion has been met, such as the number of clusters.

We will review hierarchical techniques in the next few recipes. The two clustering implementations we will provide in LingPipe are...