Beam is a programming model—what does that mean, exactly? It means that Beam defines essential primitives, from which you can construct big data processing pipelines. By design, your processing logic is unified across batch and stream processing, infinite and finite, and bounded and unbounded. One important practical benefit is that Beam allows you to reuse your code for processing an incoming stream, reprocessing historical data (for example, after fixing a bug, or receiving a data dump), or running experiments or tests on samples of data:
The essential structure of your computation is independent of what data processing engine executes it. Today there are runners—libraries for executing Beam pipelines on various data processing systems—for example, Apache Apex, Apache Flink, Apache Spark, Google Cloud Dataflow, and Apache Gearpump (incubating), with others underway for JStorm, Apache Tez, and nonTez Apache Hadoop MapReduce.
A Beam pipeline is also independent...