Distributed storage
Distributed storage simply means dividing your data into logical chunks and storing them, physically, on different machines. Software logic, written on top of this distributed storage, is responsible for distributing the data as well as querying the data from different physical machines. Capabilities such as aggregation, single-point-of-interaction, and filtering are provided by the software responsible for distributing the data across different machines.
There are mainly two main concepts to understand about distributed storage:
- Data partitioning
- Data replication
Let's briefly look at them now. We will talk about them in more detail in the coming chapters.
Data partitioning is the process of dividing the dataset into logical chunks, usually by using some deterministic algorithm and distributing the data over multiple servers, or shards. Each shard is an independent data store, and collectively, the shards make up a single logical data store.
There are many benefits of data...