Imagine your data consists of duplicates because they come from different sources. For example, you have books that come from different suppliers, but you are only interested in a single book with the same name. Of course you could use the field collapsing feature during the query, but that affects query performance and we would like to avoid that. This recipe will show you how to use the Solr deduplication functionality.
We start with the simple index structure. This should be placed in the
fields
section of yourschema.xml
file:<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="name" type="text" indexed="true" stored="true" multiValued="false"/> <field name="type" type="string" indexed="true" stored="true" multiValued="false"/>
For the purpose of the recipe, we assume that we have the following data stored in the
data.xml
file:<add> <doc>...