Katta is an open source project that enables you to store your data in a distributed manner without any failures. Although we do not see a lot of active development happening in the project, a lot of organizations have taken Katta and customized it to address their needs for distributed search. With Katta together with Hadoop and Solr, one can achieve distributed and replicated configuration of Apache Solr. There are two important tasks that can be deployed in the Hadoop framework with the help of Katta; they are indexing and searching.
The following diagram depicts the Katta architecture:
Each Katta Hadoop cluster has a master node and the rest of the other nodes participate actively in the storage of data. A master node is responsible for managing the nodes as well as determining the assignment of index shards to them. Each node is responsible for sharing a shard. A content server on each node determines the type of shard...