We can use these Lucene tokenizers with LingPipe; this is useful because Lucene has such a rich set of them. We are going to show how to wrap a Lucene TokenStream
into a LingPipe TokenizerFactory
by extending the Tokenizer
abstract class.
We will shake things up a bit and have a recipe that is not interactive. Perform the following steps:
Invoke the
LuceneAnalyzerTokenizerFactory
class from the command line:java -cp lingpipe-cookbook.1.0.jar:lib/lucene-analyzers-common-4.6.0.jar:lib/lucene-core-4.6.0.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter2.LuceneAnalyzerTokenizerFactory
The
main()
method in the class specifies the input:String text = "Hi how are you? " + "Are the numbers 1 2 3 4.5 all integers?"; Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46); TokenizerFactory tokFactory = new LuceneAnalyzerTokenizerFactory(analyzer, "DEFAULT"); Tokenizer tokenizer = tokFactory.tokenizer(text.toCharArray(), 0, text.length...