Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Prepare the data to ingest

Now that we've created our cluster and folder, we must prepare some data to work with. For this book, we're using the data warehouse data available in the on-premise SQL database we created in the first chapters. This will allow us to see another integration runtime: self-hosted. We'll copy the data in the Azure storage account created previously in this chapter.

Setting up the folder in the Azure storage account

Going back to the Azure portal, we'll create a blob container and call it sales-data, as shown in the following screenshot:

Enter a valid name: sales-data. Click on OK when done. This creates the container in the adfv2book blob storage account, as shown in this screenshot:

Now that the container is ready, let's go back to the factory to prepare the self-hosted runtime.

Self-hosted integration runtime

A self-hosted integration runtime is necessary when we want to access data in the on-premise Windows machine from ADF. This is a secure tunnel that allows ADF to...