In the next few chapters, we will build an end-to-end Data Lake solution using HDInsight. As discussed in Chapter 2, Enterprise Data Lake using HDInsight, the three key components required for a Data Lake are:
Ingest and organize
Transform
Access, analyze, and report
To understand these concepts, we will use real flight on-time performance data from the RITA website with the URL http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236.
In this chapter, we will focus on ingest and organize components: