Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar
Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Table of Contents (13 chapters)
Scaling Big Data with Hadoop and Solr Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Sharding algorithm and fault tolerance


We have already seen the sharding, collection and replicas. In this section we will look at some of the important aspects of sharding, and how it plays a role in scalability and high availability. The strategy for creating new shards is highly dependent upon the hardware and the shard size. Let's say, you have two machines M1 & M2, of, the same configuration, each with one shard. Shard A is loaded with 1 million index documents, and shard B is loaded with 100 documents. When a query is fired, the query response to any Solr queries is determined by the query response of slowest node (in this case shard A). Hence having a shard with near to equal shard sizes can perform better in this case.

Document Routing and Sharding

Typically, when any enterprise search is deployed, the size of documents to be indexed keeps growing over time. Since SolrCloud provides a way to create a cluster of Solr nodes running on index shards, it becomes feasible to scale up...