Although you can have a completely distributed system for your Big Data search, there is a limit in terms of how far you can go. As you keep on distributing the shard, you may end up facing what is called "laggard problem" for indexes for your instance.
This problem states that the response to your search query, which is an aggregation of results from all the shards is controlled by the following formulae:
QueryResponse = avg(max(shardResponseTime))
This means, if you have many shards, the odds of having one of them responding slowly (due to some anomaly) to your queries will impact your query response time, and it will start increasing.
The distributed search in Apache Solr has many limitations. Each document uploaded on the distributed Big Data must have a unique key, and that unique key must be stored in the Solr repository. To do that, Solr schema.xml
should have stored=true
against the key
attribute. This unique key has to be unique across all shards. Some of the...