Book Image

Pig Design Patterns

By : Pradeep Pasupuleti
Book Image

Pig Design Patterns

By: Pradeep Pasupuleti

Overview of this book

Table of Contents (16 chapters)
Pig Design Patterns
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

The constraint validation and cleansing design pattern


The constraint validation and cleansing pattern deals with validating the data against a set of rules and techniques and then cleansing the invalid data.

Background

Constraints tell us about the properties that the data should comply with. They can be applied to the entire database, a table, a column, or an entire schema. These constraints are rules created at design time to prevent the data from getting corrupt and reduce the overhead of processing wrong data; they dictate what values are valid for a data.

Constraints, such as null checks and range checks, can be used to know if the data ingested in Hadoop is valid or not. Often, constraint validation and cleansing on the data in Hadoop can be performed based on the business rules that actually determine the type of constraint that has to be applied on a particular subset of data.

In cases where a given column has to belong to a particular type, a data type constraint is applied. When we...