Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar
Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Table of Contents (13 chapters)
Scaling Big Data with Hadoop and Solr Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using Solr 1045 Patch – map-side indexing


Apache Solr 1045 patch provides Solr users a way to build Solr indexes using the MapReduce framework of Apache Hadoop. Once created, this index can be pushed to Solr storage. The following diagram depicts the Mapper and Reducer in Hadoop:

Each Apache Hadoop mapper transforms the input records into a set of (key, value) pairs, which then get transformed into SolrInputDocument. The Mapper task then ends up creating an index from SolrInputDocument.

The focus of Reducer is to perform de-duplication of different indexes and merge them if needed. Once the indexes are created, you can load them on your Solr instance and use them for searching. You can read more about this patch at https://issues.apache.org/jira/browse/SOLR-1045.

The patch follows the standard process of patching up your label through svn (Subversion). To apply a patch to your Solr instance, first, you need to build your Solr instance using source. The instance should be supported by Solr...