Analyzing the text data is not only about stemming, removing diacritics (if you are not familiar with the word, please take a look at http://en.wikipedia.org/wiki/Diacritic), and choosing the right format for the data. Let's assume that our client wants to be able to search by words and numbers that construct product identifiers. For example, he would like to be able to find the product identifier ABC1234XYZ
by using ABC
, 1234
, or XYZ
.
Let's start with the index that consists of three fields (add this to your
schema.xml
file to the field definition section):<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored="true"/> <field name="description" type="text_split" indexed="true" stored="true" />
The second step is to define our
text_split
type which should look like the following code (add this to yourschema.xml
file):<fieldType...