-
Book Overview & Buying
-
Table Of Contents
Pig Design Patterns
By :
This section describes the basic statistical profiling design pattern in which we use Pig scripts to apply statistical functions to capture important information about data quality.
The previous design pattern depicts one way of inferring the data type. The next logical step in the data profiling process is to evaluate the quality metrics of the values. This is done by collecting and analyzing the data by applying statistical methods. These statistics provide a high-level overview of the suitability of the data for a particular analytical problem, and uncover potential problems early in the data lifecycle management.
The basic statistical profiling design pattern helps to create data quality metadata that includes basic statistics, such as mean, median, mode, maximum, minimum, and standard deviation. These stats give you a complete snapshot of the entire data field, and tracking these statistics over time will give insights into the...
Change the font size
Change margin width
Change background colour