Book Image

Azure Data Engineering Cookbook - Second Edition

By : Nagaraj Venkatesan, Ahmad Osama
Book Image

Azure Data Engineering Cookbook - Second Edition

By: Nagaraj Venkatesan, Ahmad Osama

Overview of this book

The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure. You’ll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You’ll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you’ll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview. By the end of this book, you’ll be able to build superior data engineering pipelines along with having an invaluable go-to guide.
Table of Contents (16 chapters)

Configuring encryption using Azure Key Vault for Azure Data Lake

In this recipe, we will create a key vault and use it to encrypt an Azure Data Lake account.

Azure Data Lake accounts are encrypted at rest by default using Azure managed keys. However, you have the option of bringing your own key to encrypt an Azure Data Lake account. Using your own key gives better control over encryption.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure that you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.

How to do it…

Perform the following steps to add encryption to a Data Lake account using Azure Key Vault:

  1. Log in to portal.azure.com, click on Create a resource, search for Key Vault, and click on Create. Provide the key vault details, as shown in the following screenshot. Click on Review + Create:
Figure 2.16 – Creating an Azure key vault

Figure 2.16 – Creating an Azure key vault

  1. Go to the storage account to be encrypted. Search for Encryption on the left. Click on Encryption and select Customer-managed keys as the Encryption type. Click on Select a key vault and key at the bottom:
Figure 2.17 – Encrypting using customer-managed keys

Figure 2.17 – Encrypting using customer-managed keys

  1. On the new screen, Select a key, select Key vault as Key store type and select the newly created PacktAdeKeyVault as Key vault. Click on Create new key, as shown in the following screenshot:
Figure 2.18 – Selecting Key Vault

Figure 2.18 – Selecting Key Vault

  1. Provide a name for the key to be used for encryption of the storage account. The default option, Generate, ensures that the key is generated automatically. Click on Create:
Figure 2.19 – Creating a key

Figure 2.19 – Creating a key

  1. Once the key is created, the screen automatically moves to the key vault selection page in the Blob storage, and the newly created key is selected as the key. Click on Select:
Figure 2.20 – Selecting the key

Figure 2.20 – Selecting the key

  1. The screen moves to the encryption page on the Blob storage page. Click on Save to complete the encryption configuration.

How it works…

As the newly created key vault has been set for encryption on an Azure Data Lake account, all Data Lake operations (read, write, and metadata) will use the key from Key Vault to encrypt and decrypt the data in Data Lake. The encryption and decryption operations are fully transparent and have no impact on users' operations.

The Data Lake account automatically gets permissions on the key vault to extract the key and perform encryption on data. You can verify this by opening the key vault in the Azure portal and clicking on Access Policies. Note that the storage account has been granted Get, wrap, and unwrap permissions on the keys, as shown in the next screenshot:

Figure 2.21 – Storage account permissions in Key Vault

Figure 2.21 – Storage account permissions in Key Vault