Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Run U-SQL from a job in the Data Lake Analytics

In this section, we will learn how to create a Data Lake Analytics job that will debug and run a U-SQL script. This job will summarize data from the file created by Task 1 in the preceding data factory pipeline (the task that imports SQL Server data into a blob file). The summary data will be copied to a new file on the blob storage.

With U-SQL, we can join different blob files and manipulate/summarize the data. We can also import data from different data sources. However, in this section, we will only provide a very basic U-SQL as an example.

Let's get started...

First, we open the Data Lake Analytics resource from the dashboard. We first need to add the Blob Storage account here. Open Data sources:

Click on Add data source:

Fill in the details:

You should see the added blob storage in the list:

You can explore the containers in the blob storage and files from the Data Lake Analytics | Data explorer:

Click on Data explorer:

In order to get the path...