Book Image

Cloud Scale Analytics with Azure Data Services

By : Patrik Borosch

Book Image

Cloud Scale Analytics with Azure Data Services

By: Patrik Borosch

Overview of this book

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality. This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs. By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Share Your Thoughts

Section 1: Data Warehousing and Considerations Regarding Cloud Computing

Section 1: Data Warehousing and Considerations Regarding Cloud Computing

Free Chapter

Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses

Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses

Distinguishing between Data Warehouses and Data Lakes

Understanding the opportunities of modern cloud computing

Exploring the benefits of AI and ML

Answering the question

Chapter 2: Connecting Requirements and Technology

Chapter 2: Connecting Requirements and Technology

Formulating your requirements

Understanding basic architecture patterns

Finding the right Azure tool for the right purpose

Understanding Industry Data Models

Thinking about different sizes

Understanding the supporting services

Section 2: The Storage Layer

Section 2: The Storage Layer

Chapter 3: Understanding the Data Lake Storage Layer

Chapter 3: Understanding the Data Lake Storage Layer

Technical requirements

Setting up your Cloud Big Data Storage

Organizing your data lake

Implementing a data model in your Data Lake

Monitoring your storage account

Talking about backups

Implementing access control in your Data Lake

Setting the networking options

Discovering additional knowledge

Further reading

Chapter 4: Understanding Synapse SQL Pools and SQL Options

Chapter 4: Understanding Synapse SQL Pools and SQL Options

Uncovering MPP in the cloud – the power of 60

Provisioning a Synapse dedicated SQL pool

Talking about partitioning

Implementing workload management

Scaling the database

Understanding other SQL options in Azure

Further reading

Section 3: Cloud-Scale Data Integration and Data Transformation

Section 3: Cloud-Scale Data Integration and Data Transformation

Chapter 5: Integrating Data into Your Modern Data Warehouse

Chapter 5: Integrating Data into Your Modern Data Warehouse

Technical requirements

Setting up Azure Data Factory

Examining the authoring environment

Adding data transformation logic

Understanding integration runtimes

Integrating with DevOps

Further reading

Chapter 6: Using Synapse Spark Pools

Chapter 6: Using Synapse Spark Pools

Technical requirements

Setting up a Synapse Spark pool

Examining the Synapse Spark architecture

Programming with Synapse Spark pools

Using additional libraries with your Spark pool

Handling security

Monitoring your Synapse Spark pools

Further reading

Chapter 7: Using Databricks Spark Clusters

Chapter 7: Using Databricks Spark Clusters

Technical requirements

Provisioning Databricks

Examining the Databricks workspace

Understanding the Databricks components

Setting up security

Monitoring Databricks

Further reading

Chapter 8: Streaming Data into Your MDWH

Chapter 8: Streaming Data into Your MDWH

Technical requirements

Provisioning ASA

Implementing an ASA job

Understanding ASA SQL

Using Structured Streaming with Spark

Security in your streaming solution

Monitoring your streaming solution

Further reading

Chapter 9: Integrating Azure Cognitive Services and Machine Learning

Chapter 9: Integrating Azure Cognitive Services and Machine Learning

Technical requirements

Understanding Azure Cognitive Services

Using Cognitive Services with your data

Examining Azure Machine Learning

Using Azure Machine Learning with your modern data warehouse

Further reading

Chapter 10: Loading the Presentation Layer

Chapter 10: Loading the Presentation Layer

Technical requirements

Understanding the loading strategy with Synapse-dedicated SQL pools

Loading data into Synapse-dedicated SQL pools

Using Synapse serverless SQL pools

Integrating data with Synapse Spark pools

Exchanging metadata between computes

Further reading

Section 4: Data Presentation, Dashboarding, and Distribution

Section 4: Data Presentation, Dashboarding, and Distribution

Chapter 11: Developing and Maintaining the Presentation Layer

Chapter 11: Developing and Maintaining the Presentation Layer

Developing with Synapse Studio

Backing up and DR in Azure Synapse

Monitoring your MDWH

Understanding security in your MDWH

Further reading

Chapter 12: Distributing Data

Chapter 12: Distributing Data

Technical requirements

Building data marts with Power BI

Creating data models with Azure Analysis Services

Distributing data using Azure Data Share

Further reading

Chapter 13: Introducing Industry Data Models

Chapter 13: Introducing Industry Data Models

Understanding Common Data Model

Examining and leveraging predefined entities

Discovering Azure Industry Data Workbench

Further reading

Chapter 14: Establishing Data Governance

Chapter 14: Establishing Data Governance

Technical requirements

Discovering Azure Purview

Classifying data

Integrating with Azure services

Using data lineage

Discovering Insights

Discovering more Purview

Further reading

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Setting up a Synapse Spark pool

Now, let's examine the basic steps to spin up a Synapse Spark pool in this section.

This task is very easy to handle in a Synapse workspace:

Please navigate to the Management pane and there, in the Analytics pools section, select Apache Spark pools.
In the Details pane, click + New. The configuration blade for a new Apache Spark pool is displayed:
Figure 6.1 – Create Apache Spark pool – The Basics blade
Here you will name your new Spark pool and configure the node size value, enable Autoscale, and set the lower and upper boundaries for the autoscaling feature, if enabled. The last row in this view shows the potential cost of the lowest and the highest autoscaling setting. Click Next: Additional settings.
In the upper area of the Additional settings blade, you can now configure Auto-pause and Number of minutes idle, which sets the amount of idle time that will elapse before the cluster pauses. In the Component...