One of the most common problems that you probably came across is having to split text with whitespaces in order to segregate words from each other, to be able to process it further. This recipe will show you how to do it.
Let's start with the assumption that we have the following index structure (add this to your
schema.xml
file in the field definition section):<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="description_string" type="string" indexed="true" stored="true" /> <field name="description_split" type="text_split" indexed="true" stored="true" />
To split the text in the
description
field, we should add the following type definition:<fieldType name="text_split" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType>
To test our type, I've indexed the following XML file:
<add> <...