Book Image

Pig Design Patterns

By : Pradeep Pasupuleti
Book Image

Pig Design Patterns

By: Pradeep Pasupuleti

Overview of this book

Table of Contents (16 chapters)
Pig Design Patterns
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Data profiling for Big Data


Bad data lurks in all of the data that is ingested by Hadoop, but its impact magnifies with the phenomenal volume and variety that constitutes Big Data. Working with missing records, malformed values, and wrong file formats amplifies the amount of wasted time. What drives us to frustration is seeing the amount of data that we can't use even though we have it, data that we have at hand and then lost, and data that was not the same as it was yesterday. In a Big Data analytics project, it is common to be handed an extremely huge dataset without a lot of information as to where it came from, how it was collected, what the fields mean, and so on. In many cases, the data has gone through many hands and multiple transformations since it was gathered, and nobody really knows what it all means anymore.

Data profiling is a measure of how good the data is and the fitness to process it in the subsequent steps. It simply indicates what is wrong with the data. Data profiling...