Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Chapter 4. Azure Data Lake

One of the biggest problems that mid enterprise-sized organizations face is that data resides everywhere. Over the years, data has been accumulated usually by different systems, third-party, or in-house developed applications. Many vendors have set up a requirement to segregate their database servers in order to ensure performance, security, and management of their systems. Also, third-party vendors did not or do not want to take responsibility for their systems in a shared environment.

Organizations are starting to realize, or are already in the process of realizing, that consolidation is a must, both from the cost perspective as well as for easier manageability. However, in many cases, the vendors or developers are no longer to be found, which makes it very hard to make decisions to upgrade and/or migrate to the cloud. What could complicate things even further is the fact that shared or centralized data may be replicated everywhere and there may not even be one...