Scaling Big Data with Hadoop and Solr, Second Edition

Apache Solr can utilize HDFS for indexing and storing its indices on the Hadoop system. It does not utilize a MapReduce-based framework for indexing. The following diagram shows the interaction pattern between Solr and HDFS. You can read more details about Apache Hadoop at http://hadoop.apache.org/docs/r2.4.0/.

Let's understand how this can be done.

To start with, the first and most important task is getting Apache Hadoop set up on your machine (proxy node configuration), or setting up a Hadoop cluster. You can download the latest Hadoop tarball or zip from http://hadoop.apache.org. The newer generation Hadoop uses advanced MapReduce (also known as YARN).
Based on the requirement, you can set up a single node (Documentation: http://hadoop.apache.org/docs/r<version>/hadoop-project-dist/hadoop-common/SingleCluster.html) or a cluster (Documentation: http://hadoop.apache.org/docs/r<version>/hadoop-project-dist/hadoop-common/ClusterSetup.html).
Typically...

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Related Content you might be interested in

Current Title:

Scaling Big Data with Hadoop and Solr, Second Edition

Working with the Solr HDFS connector