Book Image

Limitless Analytics with Azure Synapse

By : Prashant Kumar Mishra
Book Image

Limitless Analytics with Azure Synapse

By: Prashant Kumar Mishra

Overview of this book

Azure Synapse Analytics, which Microsoft describes as the next evolution of Azure SQL Data Warehouse, is a limitless analytics service that brings enterprise data warehousing and big data analytics together. With this book, you'll learn how to discover insights from your data effectively using this platform. The book starts with an overview of Azure Synapse Analytics, its architecture, and how it can be used to improve business intelligence and machine learning capabilities. Next, you'll go on to choose and set up the correct environment for your business problem. You'll also learn a variety of ways to ingest data from various sources and orchestrate the data using transformation techniques offered by Azure Synapse. Later, you'll explore how to handle both relational and non-relational data using the SQL language. As you progress, you'll perform real-time streaming and execute data analysis operations on your data using various languages, before going on to apply ML techniques to derive accurate and granular insights from data. Finally, you'll discover how to protect sensitive data in real time by using security and privacy features. By the end of this Azure book, you'll be able to build end-to-end analytics solutions while focusing on data prep, data management, data warehousing, and AI tasks.
Table of Contents (20 chapters)
1
Section 1: The Basics and Key Concepts
4
Section 2: Data Ingestion and Orchestration
8
Section 3: Azure Synapse for Data Scientists and Business Analysts
14
Section 4: Best Practices

Understanding Azure Data Lake

A data lake is a storage repository that allows you to store your data in native format without having to first structure the data at any scale.

Azure Data Lake Storage provides secure, scalable, cost-effective storage for big data analytics. There are two generations of Azure Data Lake, Gen1 and Gen2; however, we will focus on Gen2 only throughout this chapter. Azure Data Lake Gen2 converges the capabilities of Azure Data Lake Gen1 with the capabilities of Azure Blob Storage with the addition of a Hierarchical Namespace to Blob Storage. Because of Azure Blob Storage's capabilities, you get a high availability/disaster recovery solutions for your data lake at a low cost.

The new Azure Blob File System (ABFS) driver is available within Azure HDInsight, Azure Databricks, and Azure Synapse Analytics, which can be used to access the data in a similar way to Hadoop Distributed File System (HDFS).

To use Data Lake Storage Gen2's capabilities, you need to create a storage account that has a hierarchical namespace. You can go through the following steps to create your Azure Data Lake Storage Gen2 account:

  1. Log in to the Azure portal: https://portal.azure.com.
  2. Click on the + Create a Resource link and select Storage account from the list of all available resources.
  3. Select the Resource group where you want to create your storage account. If you don't have a Resource group created, click on the Create new link below the drop-down list.
  4. Fill in the fields for Storage account name and Location.  
  5. Select Standard or Premium Performance as per your business need. If you are new to Data Lake, then it would be better to begin with Standard.
  6. Select an appropriate value for Account kind and Replication as per the business need. Again, the recommendation would be to leave the default selected values in these fields if you are performing this operation just for your learning purposes:
    Figure 1.10 – Creating Azure Data Lake Gen2 in Azure

    Figure 1.10 – Creating Azure Data Lake Gen2 in Azure

  7. For now, we can skip the Networking and Data protection tabs and move directly to the Advanced tab.
  8. Click on the Enabled radio button for the Hierarchical namespace property under the Advanced tab:
    Figure 1.11 – Enabling Hierarchical namespace for Data Lake Storage Gen2 on the Advanced tab

    Figure 1.11 – Enabling Hierarchical namespace for Data Lake Storage Gen2 on the Advanced tab

  9. Leave the default values for all other fields and click on Review + create.
  10. After reviewing all the details, click on Create and your Azure Data Lake Gen2 account will be created in a couple of minutes.

Now that you have already created your Azure Data Lake Gen2 account, you can use this account with Azure Synapse Analytics. We will learn how to read data from Data Lake in later chapters, but for now, we will learn about Azure Synapse Studio, and how it provides a unified experience when working with various resources under one roof.