Book Image

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro
Book Image

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Table of Contents (12 chapters)

Incorporating Spark data


Now that we have created a report on the on-premise data warehouse, we'll add some data from the cloud to it. We'll add the table we created using Databricks. To extract data from there, the cluster must be an interactive one and must also be running. To connect to Spark from Power BI, we need to use a connector that is in preview but works quite well for now.

To connect to SQL Server on-premise, we used the direct query mode. Direct query is very convenient because it does not require importing the data into the model, but it has some limitations. The one that affects us the most is that we cannot use it when the model is connected to more than one database. We're going to connect the model to the Spark cluster, so we'll lose the direct query capability of SQL Server, but our Power BI model will contain data from multiple sources.

To connect to the Spark cluster, we'll use the JDBC connection available with the premium version of our Databricks cluster. As shown in...