The token filters are declared in the <filter>
element and consume one stream of tokens, known as
TokenStream, and generate another. Hence, they can be chained one after another indefinitely. A token filter may be used to perform complex analysis by processing multiple tokens in the stream at once but in most cases it processes each token sequentially and decides to consider, replace, or ignore the token.
There may only be one official tokenizer in an analyzer; however, the token filter named WordDelimiterFilter
is in-effect a tokenizer too:
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
(Not all options were just shown) The purpose of this analyzer is to both split and join compound words with various means of defining compound words. This one is typically used with WhitespaceTokenizer
, not StandardTokenizer
, which removes punctuation-based intra...