Most of the time, the default way of calculating the score of your documents is what you need. But sometimes you need more from Solr; that's just the standard behavior. Let's assume that you would like to change the default behavior and use a different score calculation algorithm for the description
field of your index. The current version of Solr allows you to do that and this recipe will show you how to leverage this functionality.
Before choosing one of the score calculation algorithms available in Solr, it's good to read a bit about them. The description of all the algorithms is beyond the scope of the recipe and the book, but I would suggest going to the Solr Wiki pages (or look at Javadocs) and read the basic information about available implementations.
For the purpose of the recipe let's assume we have the following index structure (just add the following entries to your schema.xml
file to the fields
section):
<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text_general" indexed="true" stored="true"/> <field name="description" type="text_general_dfr" indexed="true" stored="true" />
The string
and text_general
types are available in the default schema.xml
file provided with the example Solr distribution. But we want DFRSimilarity
to be used to calculate the score for the description
field. In order to do that, we introduce a new type, which is defined as follows (just add the following entries to your schema.xml
file to the types
section):
<fieldType name="text_general_dfr" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <similarity class="solr.DFRSimilarityFactory"> <str name="basicModel">P</str> <str name="afterEffect">L</str> <str name="normalization">H2</str> <float name="c">7</float> </similarity> </fieldType>
Also, to use per-field similarity we have to add the following entry to your schema.xml
file:
<similarity class="solr.SchemaSimilarityFactory"/>
And that's all. Now let's have a look and see how that works.
The index structure presented in this recipe is pretty simple as there are only three fields. The one thing we are interested in is that the description
field uses our own custom field type called text_general_dfr
.
The thing we are mostly interested in is the new field type definition called text_general_dfr
. As you can see, apart from the index and query analyzer there is an additional section – similarity
. It is responsible for specifying which similarity implementation to use to calculate the score for a given field. You are probably used to defining field types, filters, and other things in Solr, so you probably know that the class
attribute is responsible for specifying the class implementing the desired similarity implementation which in our case is solr.DFRSimilarityFactory
. Also, if there is a need, you can specify additional parameters that configure the behavior of your chosen similarity class. In the previous example, we've specified four additional parameters: basicModel
, afterEffect
, normalization
, and c
, which all define the DFRSimilarity
behavior.
solr.SchemaSimilarityFactory
is required to be able to specify the similarity for each field.
In addition to per-field similarity definition, you can also configure the global similarity:
Apart from specifying the similarity class on a per-field basis, you can choose any other similarity than the default one in a global way. For example, if you would like to use BM25Similarity
as the default one, you should add the following entry to your schema.xml
file:
<similarity class="solr.BM25SimilarityFactory"/>
As well as with the per-field similarity, you need to provide the name of the factory class that is responsible for creating the appropriate similarity class.