Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Creating a Data Lake Analytics resource

In order to be able to run a U-SQL task or job, we need to create the Data Lake Analytics resource. In the Azure dashboard, click on New to create a new resource and look for the Data Lake Analytics resource in the new window:

Press Enter, and in the new window, click on Create:

Data Lake Analytics blade

Enter the name of the new resource (note that the resource name should contain only lowercase letters and numbers) and the rest of the information:

We click on the Data Lake Store section and choose the Data Lake Store we have previously created:

And click on Create:

Find the new resource to ensure it was created:

All resources blade

We have created the Data Lake Analytics resource and now we can run U-SQL to manipulate or summarize data. We can run U-SQL either directly from the Data Lake Analytics Resource, via job, or from the Data Factory in a pipeline.

The next two sections will show you how to do the following:

  • Run U-SQL via a job in Data Lake Analytics...