Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Using Lucene/Solr tokenizers with LingPipe


We can use these Lucene tokenizers with LingPipe; this is useful because Lucene has such a rich set of them. We are going to show how to wrap a Lucene TokenStream into a LingPipe TokenizerFactory by extending the Tokenizer abstract class.

How to do it...

We will shake things up a bit and have a recipe that is not interactive. Perform the following steps:

  1. Invoke the LuceneAnalyzerTokenizerFactory class from the command line:

    java -cp lingpipe-cookbook.1.0.jar:lib/lucene-analyzers-common-4.6.0.jar:lib/lucene-core-4.6.0.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter2.LuceneAnalyzerTokenizerFactory
    
  2. The main() method in the class specifies the input:

    String text = "Hi how are you? " + "Are the numbers 1 2 3 4.5 all integers?";
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);
    TokenizerFactory tokFactory = new LuceneAnalyzerTokenizerFactory(analyzer, "DEFAULT");
    Tokenizer tokenizer = tokFactory.tokenizer(text.toCharArray(), 0, text.length...