Feature extractors can be combined in much the same way as tokenizers in Chapter 2, Finding and Working with Words.
This recipe will show you how to combine the feature extractor from the previous recipe with a very common feature extractor over character ngrams.
We will start with a
main()
method insrc/com/lingpipe/cookbook/chapter3/CombinedFeatureExtractor.java
that we will use to run the feature extractor. The following lines set up features that result from the tokenizer using the LingPipe class,TokenFeatureExtractor
:public static void main(String[] args) { int min = 2; int max = 4; TokenizerFactory tokenizerFactory = new NGramTokenizerFactory(min,max); FeatureExtractor<CharSequence> tokenFeatures = new TokenFeatureExtractor(tokenizerFactory);
Then, we will construct the feature extractor from the previous recipe.
FeatureExtractor<CharSequence> numberFeatures = new ContainsNumberFeatureExtractor();
Next, the LingPipe...