Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Annotation


One of the most valuable services we provide is teaching our customers how to create gold-standard data, also known as training data. Nearly every successful-driven NLP project we have done has involved a good deal of customer-driven annotation. The quality of the NLP is entirely dependent on the quality of the training data. Creating training data is a fairly straightforward process, but it requires attention to detail and significant resources. From a budget perspective, you can expect to spend as much as the development team on annotation, if not more.

How to do it...

We will use sentiment over tweets as our example, and we will assume a business context, but even academic efforts will have similar dimensions.

  1. Get 10 examples of what you expect the system to do. For our example, this means getting 10 tweets that reflect the scope of what the system is expected to do.

  2. Make some effort to pick from the range of what you expect as inputs/outputs. Feel free to cherry-pick strong examples...