Book Image

Azure Data Engineering Cookbook - Second Edition

By : Nagaraj Venkatesan, Ahmad Osama
Book Image

Azure Data Engineering Cookbook - Second Edition

By: Nagaraj Venkatesan, Ahmad Osama

Overview of this book

The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure. You’ll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You’ll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you’ll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview. By the end of this book, you’ll be able to build superior data engineering pipelines along with having an invaluable go-to guide.
Table of Contents (16 chapters)

What this book covers

Chapter 1, Creating and Managing Data in Azure Data Lake, focuses on provisioning, uploading, and managing the data life cycle in Azure Data Lake accounts.

Chapter 2, Securing and Monitoring Data in Azure Data Lake, covers securing an Azure Data Lake account using firewall and private links, accessing data lake accounts using managed identities, and monitoring an Azure Data Lake account using Azure Monitor.

Chapter 3, Building Data Ingestion Pipelines Using Azure Data Factory, covers ingesting data using Azure Data Factory and copying data between Azure SQL Database and Azure Data Lake.

Chapter 4, Azure Data Factory Integration Runtime, focuses on configuring and managing self-hosted integration runtimes and running SSIS packages in Azure using Azure-SSIS integration runtimes.

Chapter 5, Configuring and Securing Azure SQL Database, covers configuring a Serverless SQL database, Hyperscale SQL database, and securing Azure SQL Database using virtual networks and private links.

Chapter 6, Implementing High Availability and Monitoring in Azure SQL Database, explains configuring high availability to Azure SQL Database using auto-failover groups and read replicas, monitoring Azure SQL Database, and the automated scaling of Azure SQL Database during utilization spikes.

Chapter 7, Processing Data Using Azure Databricks, covers integrating Azure Databricks with Azure Data Lake and Azure Key Vault, processing data using Databricks notebooks, working with Delta tables, and visualizing Delta tables using Power BI.

Chapter 8, Processing Data Using Azure Synapse Analytics covers exploring data using Synapse Serverless SQL pool, processing data using Synapse Spark Pools, Working with Synapse Lake database, and integrating Synapse Analytics with Power BI.

Chapter 9, Transforming Data Using Azure Synapse Dataflows, focuses on performing transformations using Synapse Dataflows, optimizing data flows using partitioning, and managing dynamic source schema changes using schema drifting.

Chapter 10, Building the Serving Layer in Azure Synapse SQL Pools, covers loading processed data into Synapse dedicated SQL pools, performing data archival using partitioning, managing table distributions, and optimizing performance using statistics and workload management.

Chapter 11, Monitoring Synapse SQL and Spark Pools, covers monitoring Synapse dedicated SQL and Spark pools using Azure Log Analytics workbooks, Kusto scripts, and Azure Monitor, and monitoring Synapse dedicated SQL pools using Dynamic Management Views (DMVs).

Chapter 12, Optimizing and Maintaining Synapse SQL and Spark Pools, offers techniques for tuning query performance by optimizing query plans, rebuilding replication caches and maintenance scripts to optimize Delta tables, and automatically pausing SQL pools during inactivity, among other things.

Chapter 13, Monitoring and Maintaining Azure Data Engineering Pipelines, covers monitoring and managing end-to-end data engineering pipelines, which includes tracking data lineage using Microsoft Purview and improving the observability of pipeline executions using log analytics and query labeling.