-
Book Overview & Buying
-
Table Of Contents
Pig Design Patterns
By :
The data generalization pattern deals with transforming the data by creating concept hierarchies and replacing the data with these hierarchies.
This design pattern explores the implementation of data generalization through a Pig script. Data generalization is the process of creating top-level summary layers called concept hierarchies that describe the underlying data concept in a general form. It is a form of descriptive approach in which the data is grouped and replaced by higher level categories or concepts by using concept hierarchies. For example, the raw values of the attribute age can be replaced with conceptual labels (such as adult, teenager, or toddler), or they can be replaced by interval labels (0 to 5, 13 to 19, and so on). These labels, in turn, can be recursively organized into higher level concepts, resulting in a concept hierarchy for the attribute.
In the context of Big Data, a typical analytics pipeline on huge volumes of...
Change the font size
Change margin width
Change background colour