Using patterns to replace tokens
Let's assume that we want to search inside user blog posts. We need to prepare a simple search returning only the identifier of the documents that were matched. However, we will want to remove some words because of explicit language. Of course, we can do this using the stop words
functionality, but what if we want to know how many documents have their contents censored with compute statistics on. In such a case, we can't use the stop words
functionality, we need something more, which means that we need regular expressions. This recipe will show you how to achieve such requirements using Solr and one of its filters.
How to do it...
To achieve our needs, we will use the solr.PatternReplaceFilterFactory
filter. Let's assume that we want to remove all the words that start with the word
prefix. These are the steps needed:
First, we need to create our index structure, so the fields we add to the
schema.xml
file are as follows:<field name="id" type="string" indexed...