Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Named entity coreference with a document


As seen in Chapter 5, Finding Spans in Text – Chunking, LingPipe can use a variety of techniques to recognize proper nouns that correspond to persons, places, things, genes, and so on. However, chunking doesn't quite finish the job, because it doesn't help with finding an entity when two named entities are the same. Being able to say that John Smith is the same entity as Mr. Smith, John or even an exact repeat, John Smith, can be very useful—so useful that the idea was the basis of our company when we were a baby-defense contractor. Our novel contribution was the generation of sentences indexed by what entities they mentioned, which turned out to be an excellent way to summarize what was being said about that entity, particularly if the mapping spanned languages—we call it entity-based summarization.

Note

The idea for entity-based summarization came about as a result of a talk Baldwin gave at the University of Pennsylvania at a graduate student seminar...