Book Image

Azure Data Engineering Cookbook

By : Ahmad Osama
Book Image

Azure Data Engineering Cookbook

By: Ahmad Osama

Overview of this book

Data engineering is one of the faster growing job areas as Data Engineers are the ones who ensure that the data is extracted, provisioned and the data is of the highest quality for data analysis. This book uses various Azure services to implement and maintain infrastructure to extract data from multiple sources, and then transform and load it for data analysis. It takes you through different techniques for performing big data engineering using Microsoft Azure Data services. It begins by showing you how Azure Blob storage can be used for storing large amounts of unstructured data and how to use it for orchestrating a data workflow. You'll then work with different Cosmos DB APIs and Azure SQL Database. Moving on, you'll discover how to provision an Azure Synapse database and find out how to ingest and analyze data in Azure Synapse. As you advance, you'll cover the design and implementation of batch processing solutions using Azure Data Factory, and understand how to manage, maintain, and secure Azure Data Factory pipelines. You’ll also design and implement batch processing solutions using Azure Databricks and then manage and secure Azure Databricks clusters and jobs. In the concluding chapters, you'll learn how to process streaming data using Azure Stream Analytics and Data Explorer. By the end of this Azure book, you'll have gained the knowledge you need to be able to orchestrate batch and real-time ETL workflows in Microsoft Azure.
Table of Contents (11 chapters)

Provisioning and connecting to an Azure PostgreSQL database using the Azure CLI

Azure Database for PostgreSQL is a Database-as-a-Service offering for the PostgreSQL database. In this recipe, we'll learn how to provision an Azure database for PostgreSQL and connect to it.

Getting ready

We'll be using the Azure CLI for this recipe. Open a new Command Prompt or PowerShell window, and run az login to log in to the Azure CLI.

How to do it…

Let's begin with provisioning a new Azure PostgreSQL server.

Provisioning a new Azure PostrgreSQL server

The steps are as follows:

  1. Execute the following Azure CLI command to create a new resource group:
    az group create --name rgpgressql --location eastus
  2. Execute the following command to create an Azure server for PostgreSQL:
    az postgres server create --resource-group rgpgressql --name adepgresqlserver  --location eastus --admin-user pgadmin --admin-password postgre@SQL@1234 --sku-name B_Gen5_1


    It may take 10–15 minutes for the server to be created.

  3. Execute the following command to whitelist the IP in the PostgreSQL server firewall:
    $clientip = (Invoke-RestMethod -Uri
    az postgres server firewall-rule create --resource-group rgpgressql --server adepgresqlserver --name hostip --start-ip-address $clientip --end-ip-address $clientip

Connecting to an Azure PostgreSQL server

We can connect to an Azure PostgreSQL server using psql or pgadmin (a GUI tool for PostgreSQL management), or from any programming language using a relevant driver.

To connect from psql, execute the following command in a Command Prompt or PowerShell window:

PS C:\Program Files\PostgreSQL\12\bin> .\psql.exe --port=5432 --username=pgadmin@adepgresqlserver --dbname=postgres

Provide the password and you'll be connected. You should get an output similar to the one shown in the following screenshot:

Figure 2.5 – Connecting to PostgreSQL

Figure 2.5 – Connecting to PostgreSQL

How it works…

To provision a new Azure PostgreSQL server, execute the following Azure CLI command – az postgres server create. We need to specify the server name, resource group, administrator username and password, location, and SKU name parameters. As of now, there are three different SKUs:

To connect to the PostgreSQL server, we first need to whitelist the IP in the server firewall. To do that, we run the az postgres server firewall-rule create Azure CLI command.

We need to provide the firewall rule name, server name, resource group, and start and end IP.

Once the firewall rule is created, the PostgreSQL server can be accessed by any of the utilities, such as psql or pgadmin, or from a programming language. To connect to the server, provide the host or server name as <postgresql server name> and the port as 5432. We also need to provide the username and password. If you are connecting for the first time, provide the database name as postgres.