Capabilities (and non-capabilities) of a streaming application
We have talked about Lambda architecture in chapters and if we remember, Lambda had two layers:
- A batch layer that worked on a huge dataset essentially recreating views over a period of time
- A stream/speed layer that worked on a small dataset and ran logic that was sufficient to give a good enough answer quickly
Lambda architecture's speed layer
Let's do a bit of a deep dive into the speed layer to have a common understanding.
In the batch layer, we run a compute algorithm over the entire dataset. This dataset can be, and usually is, in peta bytes ranges. Clearly, this is a very resource-intensive operation and one that simply throws the concept of latency out of the window. Latency is not a concern for the batch layer. But for the speed layer, latency plays an important role. If the speed layer is not able to produce results in acceptable latency, usually in a few seconds, the speed layer may as well be regarded as a distant cousin...