-
Book Overview & Buying
-
Table Of Contents
Apache Solr for Indexing Data
By :
Solr provides us with a way to prevent duplicate or nearly duplicate elements to get indexed using a signature/fingerprint field. It natively provides a deduplication technique of this type via the signature class, and this can further be used to implement new hash and signature implementations.
Let's see how we can implement deduplication in Solr. We'll use our musicCatalog core, which we used in the previous chapter as well, and will modify it:
Copy the musicCatalog core and create a new core called musicCatalog-dedupe from it. After we have created the new core, we'll change schema.xml to add a signature field that will contain the document signature/fingerprint:
<!-- Field to store the fingerprint/signature --> <field name="signature" type="string" indexed="true" stored="true" required="true" multiValued="false" />
After adding the field, we'll add a new UpdateRequestProcessor element to solrconfig.xml configuration file, which will...
Change the font size
Change margin width
Change background colour