We have already understood what sharding is in Chapter 3, Making Big Data Work for Hadoop and Solr. As the data gets populated in Apache Solr, the size of the Solr index grows, given that each Solr index contains many files/documents/records, and it becomes large enough to fit on a single machine. Additionally, with the growth of the indexes, it is possible that the performance of search query can slow down. Single Solr machine also suffers from concurrency issues and low I/O support. This, in turn, demands distributing the index across multiple machines. Solr can run a distributed query across multiple machines aggregating the results into one.
With the release of Solr 4.1, lots of these things are automated. SolrCloud does index distribution to the appropriate shard; it also takes care of distributing search across multiple shards. Search is possible with near real time, after the document is committed. ZooKeeper provides load balancing...