Stemming is a very common requirement; it is the process of reducing words to their root form (or stems). Let's imagine the book e-commerce store, where you store the books' names and descriptions. We want to be able to find words such as shown
and showed
when you type the word show
, and vice versa. We can achieve this requirement using stemming algorithms. The thing is, there are no general stemmers; they are language-specific. This recipe will show you how to add stemming to your data analysis chain and where to look for a list of stemmers.
To achieve our requirement to stem English, we need to take certain steps:
We will start with the index structure. Let's assume that our index consists of three fields that we defined in the
schema.xml
file:<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="string" indexed="true" stored="true" /> <field name="description" type="text_stem" indexed...