High-level reference architecture
If I have to give a short-phrase definition of a streaming system, I would be very tempted to call it a complex adaptive system. A data intensive stream system needs to have the ability to analyze the data without waiting for it to be persisted. This is an important requirement and possibly one, if not the only one, requirement that defines the architecture of a streaming system. Data that gets ingested should be made meaningful at the very moment that it gets ingested. Every second of delay can sometimes be the difference between winning or losing. This may sound dramatic, but let me try to prove it with a real-world example.
Imagine you are a security incident response team working on identifying the actors behind the cyber attack that is currently unfolding in your customers' network. You see hundreds of thousands of IP addresses in the logs and you want to filter out the relevant ones (ones that MAY indicate a suspicious actor) from the irrelevant ones...