Book Image

Learning Cascading

Book Image

Learning Cascading

Overview of this book

Table of Contents (18 chapters)
Learning Cascading
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
7
Optimizing the Performance of a Cascading Application
Index

Understanding how Cascading represents records


Now that you have gotten a glimpse of how you can implement a data processing system using Cascading, let's dive into the internals of it. In this section, we will learn how to define and structure data streams for Cascading processing.

Using tuples and defining fields

The idea of a tuple is very similar to that of a record in a database.

Tuples (cascading.tuple.Tuple) provide storage for a vector or a collection of values, addressed by offset that can be associated with specific object types and names. A tuple can have data objects of different types. A series of tuples make a tuple stream. A simple example of a tuple is [String name, Integer age]. A tuple has a large set of methods to get, set, append, and remove fields.

Strictly speaking, a tuple stores data in positional columns, such that each column is accessed by its ordinal position. Each element is assumed to represent a single data element, and all are assumed to be of a standard Java...