-
Book Overview & Buying
-
Table Of Contents
Pig Design Patterns
By :
Time looking at your data is always well spent. | ||
| --Witten, et al | ||
In the previous chapter, you studied the various patterns for ingesting and egressing different types of data into and from the Hadoop ecosystem, so that the next logical steps in the analytics process can begin. In this chapter, we will understand the most widely used design patterns related to data profiling. This chapter is all about a step-by-step approach to diagnose if your dataset has any problem, and ultimately turning the dataset into usable information.
Data profiling is a necessary first step in getting any meaningful insight into the data ingested by Hadoop, by understanding the content, context, structure, and condition of data.
The data profiling design patterns described in this chapter, collect important information on attributes of data in the Hadoop cluster, before initiating the process of cleaning the data into a more useful form. In this chapter, we will look at the following...
Change the font size
Change margin width
Change background colour