Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (8 chapters)

The Modern Data Warehouse

Azure Data Factory (ADF) is a service that is available in the Microsoft Azure ecosystem. This service allows the orchestration of different data loads and transfers in Azure.

Back in 2014, there were hardly any easy ways to schedule data transfers in Azure. There were a few open source solutions available, such as Apache Falcon and Oozie, but nothing was easily available as a service in Azure. Microsoft introduced ADF in public preview in October 2014, and the service went to general availability in July 2015.

The service allows the following actions:

  • Copying data from various sources and destinations
  • Calling various computation services, such as HDInsight and Azure data warehouse data transformations
  • Orchestrating the preceding activities using time slices and retrying the activities when there is an error

All these activities were available via the Azure portal at first, and in Visual Studio 2013 before general availability (GA).