-
Book Overview & Buying
-
Table Of Contents
Building Big Data Pipelines with Apache Beam
By :
A PTransform is a short name for parallel transform – an Apache Beam primitive for transforming PInput into POutput. PInput is a labeling interface that marks objects as suitable as input to PTransform, while POutput marks objects as suitable as outputs. We already know these objects quite well – a typical one that's used for both input and output is PCollection. But there are others as well – most notably PCollectionTuple and PCollectionList. There are also two special objects – PBegin and PDone. As we already know, an Apache Beam program – a pipeline – is a DAG whose edges represent PCollections and whose nodes represent PTransforms. PTransforms in the DAG that take PBegin as input are roots, while PTransforms that produce PDone are the leaves of the DAG.
This can be seen in the following diagram:
Figure 4.1 – DAG of PTransforms and PCollections
A PTransform is a recursive...