The function of a tokenizer is to break input text into tokens, where each token is a stream of characters in the text. You configure a tokenizer for a text field type in schema.xml
with a <tokenizer>
element, which is a child of <analyzer>
, like this for example:
<fieldType name="text" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
In the preceding example, you can see that a class attribute names a factory class that will instantiate a tokenizer object when needed. Tokenizer factory classes implement org.apache.solr.analysis.TokenizerFactory
. You can pass arguments to tokenizer factories by setting attributes in the <tokenizer>
element. Here is an example of this:
<fieldType name="semicolonDelimited" class="solr.TextField"> <analyzer type="query"> <tokenizer class...