Book Image

Pig Design Patterns

By : Pradeep Pasupuleti
Book Image

Pig Design Patterns

By: Pradeep Pasupuleti

Overview of this book

Table of Contents (16 chapters)
Pig Design Patterns
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

The pattern-matching pattern


This section describes the pattern-matching design pattern in which we use Pig scripts to match numeric and text patterns, to ascertain if the data is coherently relative to itself and thus, get a measure of data quality.

Background

In the enterprise context, examining the data for coherence comes after the data has been ingested and its completeness and correctness has been ascertained. The values of a given attribute can come in different shapes and sizes. This is especially true for fields requiring human input, where the values are entered according to the whims of the user. Assuming a column representing the phone number field is coherent, it can be said that all the values represent valid phone numbers since they match the expected format, length, and data type (numeric), thus meeting the expectation of the system. Wrongly representing data in incorrect format leads to inaccurate analytics, and in the Big Data context, its sheer volume can amplify this inaccuracy...