As seen in Chapter 5, Finding Spans in Text – Chunking, LingPipe can use a variety of techniques to recognize proper nouns that correspond to persons, places, things, genes, and so on. However, chunking doesn't quite finish the job, because it doesn't help with finding an entity when two named entities are the same. Being able to say that John Smith is the same entity as Mr. Smith, John or even an exact repeat, John Smith, can be very useful—so useful that the idea was the basis of our company when we were a baby-defense contractor. Our novel contribution was the generation of sentences indexed by what entities they mentioned, which turned out to be an excellent way to summarize what was being said about that entity, particularly if the mapping spanned languages—we call it entity-based summarization.
Natural Language Processing with Java and LingPipe Cookbook
Natural Language Processing with Java and LingPipe Cookbook
Overview of this book
Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Simple Classifiers
Finding and Working with Words
Advanced Classifiers
Tagging Words and Tokens
Finding Spans in Text – Chunking
String Comparison and Clustering
Finding Coreference Between Concepts/People
Index
Customer Reviews