Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Conditional random fields (CRF) for word/token tagging


Conditional random fields (CRF) are an extension of the Logistic regression recipe in Chapter 3, Advanced Classifiers, but are applied to word tagging. At the end of Chapter 1, Simple Classifiers, we discussed various ways to encode a problem into a classification problem. CRFs treat the sequence tagging problem as finding the best category where each category (C) is one of the C*T tag (T) assignments to tokens.

For example, if we have the tokens The and rain and tag d for determiner and n for noun, then the set of categories for the CRF classifier are:

  • Category 1: d d

  • Category 2: n d

  • Category 3: n n

  • Category 4: d d

Various optimizations are applied to keep this combinatoric nightmare computable, but this is the general idea. Crazy, but it works.

Additionally, CRFs allow random features to be used in training in the exact same way that logistic regression does for classification. Additionally, it has data structures optimized for HMM...