Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Calling Databricks notebook execution in ADF


We now have laid down everything to trigger the notebook execution in ADF. Going back to the factory, we're going to add a linked service. So far, all the linked services we created in this book were connected to a data store: SQL Server, blob storage, and so on. This time, we're going to use a computation linked service: Azure Databricks.

As shown in the following screenshot, add a linked service. Click on the Compute tab, select Azure Databricks, and click on Continue:

We'll now enter the details of the cluster in the next step. We used Azure Databricks for the Type property. The following screenshot shows the properties to set:

The properties are explained as follows:

  • Connect via integration runtime: We use the Default one. It has access to all Azure resources.
  • Account selection method: From Azure subscription.
  • Select cluster: We're going to create a cluster on the fly and it will be running only for the duration of our job; therefore, we select...