Hands-On Data Warehousing with Azure Data Factory

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Buy this Book

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Buy this Book

Overview of this book

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick’s Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

The Modern Data Warehouse

The need for a data warehouse

The modern data warehouse

What's new in V2.0?

Summary

Getting Started with Our First Data Factory

Summary

SSIS Lift and Shift

SSIS in ADF

Leveraging our package in ADF V2

Summary

Azure Data Lake

Creating and configuring Data Lake Store

Creating a Data Lake Analytics resource

Using the data factory to manipulate data in the Data Lake

Run U-SQL from a job in the Data Lake Analytics

Summary

Machine Learning on the Cloud

Machine learning overview

Machine learning tasks

Azure Machine Learning Studio

Breast cancer detection

Summary

Introduction to Azure Databricks

Azure Databricks setup

Prepare the data to ingest

Copy data from SQL Server to sales-data

Databricks notebook

Calling Databricks notebook execution in ADF

Summary

Reporting on the Modern Data Warehouse

Different types of BI

Power BI consumption

Creating our Power BI reports

Incorporating Spark data

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The need for a data warehouse

A data warehouse is a repository of enterprise data used for reporting and analysis. There have been three waves of data warehouses so far, which we will cover in the upcoming subsections.

Driven by IT

This is the first wave of business intelligence (BI). IT needed to separate operational data and databases from its origin for the following reasons:

Keep data changes history. Some operational applications purge the data after a while.
When users wanted to report on the application's data, they were often affecting the performance of the system. IT replicated the operational data to another server to avoid any performance impact on applications.
Things got more complex when users wanted to do analysis and reports on databases from multiple enterprise's applications. IT had to replicate all the needed systems and make them speak together. This implied that new structures had to be built and new patterns emerged from there: star schemas, decision support systems (DSS), OLAP cubes, and so on.

Self-service BI

Analysts and users always need data warehouses to evolve at a faster pace. This is the second wave of BI and it happened when major BI players such as Microsoft and Click came with tools that enabled users to merge some data with or without data warehouses. In many enterprises, this is used as a temporary source of analytics or proof of concept. On the other hand, not every data could fit at that time in data warehouses. Many ad hoc reports were, and are still, using self-service BI tools. Here is a short list of such tools:

Microsoft Power Pivot
Microsoft Power BI
Click

Cloud-based BI – big data and artificial intelligence

This is the third wave of BI. The cloud capabilities enable enterprises to do more accurate analysis. Big data technologies allows users to base their analysis on much bigger data volumes. This helps them deriving patterns form the data and have technologies that incorporate and modify these patterns. This leads to artificial intelligence or AI.

Technologies used in big data are not that new. They were used by many search engines in the early 21st century such as Yahoo! and Google. They have also been used quite a lot in research faculties in different enterprises. The third wave of BI broaden the usage of these technologies. Vendors such as Microsoft, Amazon, or Google make it available to almost everyone with their cloud offer.]

Hands-On Data Warehousing with Azure Data Factory

By : Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Hands-On Data Warehousing with Azure Data Factory

By: Christian Cote, Michelle Gutzait, Giuseppe Ciaburro

Overview of this book

Related Content you might be interested in

Current Title:

Hands-On Data Warehousing with Azure Data Factory

ETL with Azure Cookbook

Azure Data Factory Cookbook

SQL Server 2017 Integration Services Cookbook

The need for a data warehouse

Driven by IT

Self-service BI

Cloud-based BI – big data and artificial intelligence