Book Image

Cloud Scale Analytics with Azure Data Services

By : Patrik Borosch
Book Image

Cloud Scale Analytics with Azure Data Services

By: Patrik Borosch

Overview of this book

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality. This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs. By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.
Table of Contents (20 chapters)
1
Section 1: Data Warehousing and Considerations Regarding Cloud Computing
4
Section 2: The Storage Layer
7
Section 3: Cloud-Scale Data Integration and Data Transformation
14
Section 4: Data Presentation, Dashboarding, and Distribution

Using data lineage

Once the data factory is connected, it will send lineage information into your Purview environment for every pipeline that is run. Give it a try and create a Data Factory pipeline that copies data from one folder to another in your data lake. Remember: you are quickest when you use the Copy Data Wizard (or just use the MyFirstPipeline pipeline that you created in Chapter 5, Integrating Data in Your Modern Data Warehouse, if you used the data factory there).

When you are finished in the data factory, switch back to Purview, repeat your scan (again, this might take a few minutes), and search for the newly created file or the pipeline name, and in the asset details, check the Lineage tab:

Figure 14.22 – First lineage overview for a Data Factory Copy pipeline

When you check the lineage closely, you will see that you can drill down to the column level and reveal even the column mappings.

Imagine the power of this feature, when you...