Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

CRFs for chunking


CRFs are best known to provide close to state-of-the-art performance for named-entity tagging. This recipe will tell us how to build one of these systems. The recipe assumes that you have read, understood, and played with the Conditional r andom fields – CRF for word/token tagging recipe in Chapter 4, Tagging Words and Tokens, which addresses the underlying technology. Like HMMs, CRFs treat named entity detection as a word-tagging problem, with an interpretation layer that provides chunkings. Unlike HMMs, CRFs use a logistic-regression-based classification approach, which, in turn, allows for random features to be included. Also, there is an excellent tutorial on CRFs that this recipe follows closely (but omits details) at http://alias-i.com/lingpipe/demos/tutorial/crf/read-me.html. There is also a lot of information in the Javadoc.

Getting ready

Just as we did earlier, we will use a small hand-coded corpus to serve as training data. The corpus is in src/com/lingpipe/cookbook...