Event-Data Pipelines
An event-data pipeline is a much more important component in a data-intensive application than a query-data pipeline. The event-data pipeline needs to handle a large volume of event data being produced by many sources of data in a reliable, efficient, and fault-tolerant manner. The event-data pipeline follows a publish-subscribe principle, as shown in the following diagram:
A client publishes a message to a central queuing server and other clients who wish to receive the messages subscribe with the central queuing server for the message to be delivered to them.
Whenever you try to define an event-data pipeline, keep in mind the three V's of the data:
- Volume: What is the average and peak volume of the data that we expect?
- Variety: What different variety of data we expect–documents, events, size, and so on.
- Velocity: What is the average and peak velocity that we can expect the data to arrive?
Let's go through the process of architecting an event-data pipeline.
There are many...