-
Book Overview & Buying
-
Table Of Contents
Reproducible Data Science with Pachyderm
By :
This section walks you through the main Pachyderm pipeline concepts. The Pachyderm Pipeline System (PPS) is the centerpiece of Pachyderm functionality.
A Pachyderm pipeline is a sequence of computational tasks that data undergoes before it outputs the final result. For example, it could be a series of image processing tasks, such as labeling each image or applying a photo filter. Or it could be a comparison between two datasets or a finding similarities task.
A pipeline performs the following three steps:
The following diagram shows how a Pachyderm pipeline works:
Figure 2.4 – Pachyderm pipeline
Each Pachyderm pipeline has an input and output repository. An input repository is a filesystem within Pachyderm where it is being placed from an outside source...