Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Feature extractors


Up until now, we have been using characters and words to train our models. We are about to introduce a classifier (logistic regression) that allows for other observations about the data to inform the classifier—for example, whether a word is actually a date. Feature extractors are used in CRF taggers and K-means clustering. This recipe will introduce feature extractors independent of any technology that uses them.

How to do it...

There is not much to this recipe, but the upcoming Logistic regression recipe has many moving parts, and this is one of them.

  1. Fire up your IDE or type in the command line:

    java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter3.SimpleFeatureExtractor
    
  2. Type a string into our standard I/O loop:

    Type a string to see its features
    My first feature extraction!
  3. Features are then produced:

    !=1
    My=1
    extraction=1
    feature=1
    first=1
  4. Note that there is no order information here. Does it keep a count or not?

    Type a string to see its...