Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Azure Databricks setup


This section describes how to set up Databricks in Azure. Once logged in to the Azure portal, click on Create a resource in the top-left corner and select the Analytics category. As shown in the following screenshot, click on Azure Databricks:

We're now redirected to the Azure Databricks Service blade. We first need to set up a workspace, as shown in the following screenshot:

The parameters are explained here:

  • Workspace name: ADFV2DataBricks
  • Subscription: The subscription in use when you logged in to Azure
  • Resource group: The same as we have used since the beginning—ADFV2Book
  • Location: The same location of all our resources used so far
  • Pricing Tier: Premium. This is mandatory to be able to connect Power BI to Databricks

Once done, check the Pin to dashboard option and click on Create at the bottom-left of the blade to create the workspace. After a few minutes, the workspace is ready to use, as shown in this screenshot:

A workspace is a placeholder or folder where we store all...