Like tokenizers, filters consume tokens as input and again produce a stream of tokens. The function of a filter is a bit different from that of a tokenizer. Unlike a tokenizer, a filter receives tokens as the input (passed by a tokenizer), and its function is to look at each token and decide whether to keep this token, change/replace it, or discard it. Filters are also derive from org.apache.lucene.analysis.TokenStream
.
A typical example of a filter looks something like this:
<fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Filters are configured in schema.xml
with a <filter>
element as a child of <analyzer>
, following the <tokenizer>
element. Since filters take token streams as input, the filter definition should follow the tokenizer or...