Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Using the data factory to manipulate data in the Data Lake


In the previous section, we created the Data Lake Analytics Resource for the U-SQL task:

  • Even though possible, it is not at all straightforward to run U-SQL to connect directly to an SQL database. It involves tweaking firewalls and permissions. This is why we do not cover this part in the next section, which describes how to run a U-SQL job directly from the Data Lake Analytics resource.
  • It is much simpler to copy data from an SQL Server database to a file on Azure Blob Storage via the Azure Data Factory.
  • In this section, we show how to do this and then how to manipulate the copied data with U-SQL using the Azure Data Factory.

We will now create a pipeline in Azure Data Factory that will do the following:

  • Task 1: Import data from SQL Server (from a view) into a file on blob storage
  • Task 2: Use U-SQL to export summary data to a file on blob storage

Task 1 – copy/import data from SQL Server to a blob storage file using data factory

Let's create...