The patch provides RecordWriter
to generate Solr index. It also provides OutputFormat
for outputting your indexes. With Solr-1301 patch, we only need to implement the reducer since this patch is based on reducer.
You can follow the given steps to achieve reduce-side indexing using Solr-1301:
Get
solrconfig.xml
,schema.xml
and other configurations in theconf
folder, and also get all the Solr libraries in thelib
folder.Implement
SolrDocumentConverter
that takes the <key, value> pair and returnsSolrInputDocument
. This converts output records to Solr documents.public class HadoopDocumentConverter extends SolrDocumentConverter<Text, Text> { @Override public Collection<SolrInputDocument> convert(Text key, Text value) { ArrayList<SolrInputDocument> list = new ArrayList<SolrInputDocument>(); SolrInputDocument document = new SolrInputDocument(); document.addField("key", key); document.addField("value", value); list.add(document); return list; } }
Create a simple reducer as follows:
public static class IndexReducer { protected void setup(Context context) throws IOException, InterruptedException { super.setup(context); SolrRecordWriter.addReducerContext(context); } }
Now configure the Hadoop reducer and configure the job. Depending upon the batch configuration (that is,
solr.record.writer.batch.size
), the documents are buffered before updating the index.SolrDocumentConverter.setSolrDocumentConverter(HadoopDocumentConverter.class, job.getConfiguration()); job.setReducerClass(SolrBatchIndexerReducer.class); job.setOutputFormatClass(SolrOutputFormat.class); File solrHome = new File("/user/hrishikes/solr"); SolrOutputFormat.setupSolrHomeCache(solrHome, job.getConfiguration());
The solrHome
is the patch where solr.zip
is stored. Each task initiates the EmbeddedServer
instance for performing the task.