Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Leveraging our package in ADF V2

So far, we haven't done anything new, in the sense that everything we did was on-premises. This part of the book will focus on cloud leveraging of SSIS packages.

Before ADF V2, the only way to achieve orchestration with SSIS was to schedule our SSIS load on an on-premises (or an Azure) virtual machine, and then schedule an ADF V1.0 pipeline every n amount of minutes. If the data was not available at a specific time, the next ADF run would take it. Or, we had to tell ADF to wait for it before processing the rest of its pipeline.

Also, with the advent of SSIS 2017, the scaling out of package execution had to be done on-premises. There are a couple of issues with it:

  • Who is responsible for the data warehouse data different usage? The developers that create and maintain the packages are not necessarily aware of the cloud implications of their processes. The data might be used in systems other than the ones they had in their specifications, when they first developed...