One of the most common problems that you have probably come across is having to split the text with whitespaces in order to segregate words from each other, to be able to process it further. This recipe will show you how to do it.
Let's assume that we have the following index structure (add this to your schema.xml
file in the field definition section):
<field name="description_string" type="string" indexed="true" stored="true" /> <field name="description_split" type="text_split" indexed="true" stored="true" />
To split the text in the description
field, we should add the following type definition:
<fieldType name="text_split" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType>
To test our type, I've indexed the following XML file:
<add> <doc> <field name="description_string">test text</field> <field name="description_text...