Book Image

Learning Cascading

Book Image

Learning Cascading

Overview of this book

Table of Contents (18 chapters)
Learning Cascading
About the Authors
About the Reviewers
Optimizing the Performance of a Cascading Application

Understanding how Cascading controls data flow

Now, we know what the Cascading record looks like. How do we process these records? How do we move and manipulate data? Cascading provides us with the concept of pipes. Pipes control how data is managed during the processing segment.

Using pipes

Pipes are things that do stuff. The Cascading API allows the developer to assemble pipe assemblies that split, merge, group, or join streams. As data moves through pipes, streams may be separated or combined for various purposes:

Figure 2.3 – Pipe definition

Some pipes, such as Merge, GroupBy, and the Join classes, perform single actions on entire Tuple streams. Others, such as Each and Every require an operation to be attached to them. It is the operation code that performs the desired task. We will look at operations briefly in this chapter, though Chapter 3, Understanding Custom Operations will explore them in detail.

Creating and chaining

Pipes are created through both declaration and instantiation, and...