Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data. This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users. By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1 Introduction to Azure Synapse Data Explorer

Free Chapter

Chapter 1: Introducing Azure Synapse Data Explorer

Technical requirements

Understanding the lifecycle of data

The need for a fast and highly scalable data exploration service

What is Azure Synapse?

What is Azure Synapse Data Explorer?

Integrating Data Explorer pools with other Azure Synapse services

Exploring the Data Explorer pool infrastructure and scalability

What makes Azure Synapse Data Explorer unique?

When to use Azure Synapse Data Explorer

Summary

Chapter 2: Creating Your First Data Explorer Pool

Technical requirements

Creating a free Azure account

Creating an Azure Synapse workspace

Creating a Data Explorer pool using Azure Synapse Studio

Creating a Data Explorer pool using the Azure portal

Creating a Data Explorer pool using the Azure CLI

Summary

Chapter 3: Exploring Azure Synapse Studio

Technical requirements

Exploring the user interface of Azure Synapse Studio

Running your first query

Managing and monitoring Data Explorer pools

Monitoring Data Explorer pools

Summary

Chapter 4: Real-World Usage Scenarios

Technical requirements

Building a multi-purpose end-to-end analytics environment

Managing IoT data

Processing and analyzing geospatial data

Enabling real-time analytics with big data

Performing time series analytics

Summary

Part 2 Working with Data

Chapter 5: Ingesting Data into Data Explorer Pools

Technical requirements

Understanding the data loading process

Defining a retention policy

Choosing a data load strategy

Performing data ingestion

Summary

Chapter 6: Data Analysis and Exploration with KQL and Python

Technical requirements

Analyzing data with KQL

Exploring Data Explorer pool data with Python

Summary

Chapter 7: Data Visualization with Power BI

Technical requirements

Introduction to the Power BI integration

Creating a Power BI report

Adding data sources to your Power BI report

Connecting Power BI with your Azure Synapse workspace

Authoring Power BI reports from Azure Synapse Studio

Summary

Chapter 8: Building Machine Learning Experiments

Technical requirements

Understanding the application of ML

Introducing ML into your projects with AutoML

Exploring additional ML capabilities in Azure Synapse

Summary

Chapter 9: Exporting Data from Data Explorer Pools

Technical requirements

Understanding data export scenarios

Exporting data with client tools

Using server-side export to pull data

Performing robust exports with server-side data push

Configuring continuous data export

Summary

Part 3 Managing Azure Synapse Data Explorer

Chapter 10: System Monitoring and Diagnostics

Technical requirements

Monitoring your environment

Setting up alerts

Summary

Chapter 11: Tuning and Resource Management

Technical requirements

Implementing resource governance with workload groups

Speeding up queries using cache policies

Summary

Chapter 12: Securing Your Environment

Technical requirements

Security overview

Managing data encryption

Authenticating users

Configuring access to resources

Implementing network security

Protecting against external threats

Summary

Chapter 13: Advanced Data Management

Technical requirements

Managing extents

Purging personal data

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

When to use Azure Synapse Data Explorer

By now, you should already understand that Azure Synapse Data Explorer is an analytical engine to process queries on unstructured, semi-structured, and structured data, with exceptionally large data volumes, low-latency ingestion, and blazing-fast queries. Data Explorer is not, however, the solution to every data problem. In some cases, you will be better off with a different solution. Let us look at some of the most common analytics scenarios and the most appropriate analytical store in each case, as follows:

Scenario: I need a classic data warehouse.

Recommendation: Do not use Azure Synapse Data Explorer. Use dedicated SQL pools in Azure Synapse, which are optimized for user queries in a typical star schema, even at large data volumes.

Scenario: My solution requires frequent updates on individual records, and singleton INSERT, UPDATE, and DELETE operations.

Recommendation: Do not use Azure Synapse Data Explorer. In such cases, a transactional, operational database will be a better solution. Consider options such as Azure SQL, SQL Server (on-premises, or in an Azure VM), MySQL, or even Cosmos DB for NoSQL scenarios.

Scenario: My solution needs to run on a cloud other than Microsoft Azure, or on-premises.

Recommendation: Do not use Azure Synapse Data Explorer, as it runs exclusively on Azure.

Scenario: My data demands constant transformation and long-running extract, transform, load (ETL)/extract, load, transform (ELT) processes.

Recommendation: Do not use Azure Synapse Data Explorer. Even though you have Synapse pipelines in your Synapse workspace, and you can constantly ingest data into Data Explorer pools, the core scenario for Data Explorer is to offer interactive analytics on big data. You are better off running your ETL/ELT pipelines on Azure Synapse pipelines, ADF, Apache Spark, or even Azure Batch.

Scenario: I need to train large ML models several times throughout the day.

Recommendation: This may be a good scenario for Azure Synapse Data Explorer. In this case, you can prepare data or train models on Apache Spark for Azure Synapse, but note that you will miss out on the real-time characteristic of data analysis that Data Explorer offers. Ideally, you want to use Data Explorer with data streaming from devices and applications in real time, but this still can be a valuable scenario for Azure Synapse Data Explorer. This may be less valuable when using the standalone service Azure Data Explorer, as it will not benefit from the native, in-product integration with Apache Spark (even though a connector for Spark is available for Azure Data Explorer uses).

Scenario: I have a very small amount of data to analyze.

Recommendation: It depends. If your analysis requires a full-text search or JSON documents, you may benefit from the indexing capabilities of Azure Synapse Data Explorer. It can also be a suitable alternative if you need to correlate this data with other data stored on Synapse SQL or in the data lake. If you are on a low budget and don’t need the added benefit of Azure Synapse, you may be better served with SQL Server, Azure Cognitive Search, or even Cosmos DB.

Scenario: I need to perform time-series analysis on metric data from sensors, social media, websites, financial transactions, or other fast streaming data.

Recommendation: You should use Azure Synapse Data Explorer. Data Explorer pools are optimized for application log and IoT device data and can ingest data at high volumes offering insights in near real time.

Scenario: I have data in a diverse schema, and with high volumes of data in near real time.

Recommendation: You should use Azure Synapse Data Explorer. Data Explorer pools are optimized for unstructured, semi-structured, and structured data and allow you to run interactive analytics on data of any shape.

Scenario: I need to correlate application logs or telemetry data from IoT devices with data sitting in a data warehouse and the data lake.

Recommendation: You should use Azure Synapse Data Explorer. By leveraging the SQL analytical pools in Azure Synapse (dedicated and serverless), you can use one tool to query all your data, regardless of the analytical store that holds it.

The rule of thumb is to think about Data Explorer pools when you are managing telemetry or log analytics data at scale. You should use it with Azure Synapse when you need to combine your analysis with data from other sources or use the added benefits of Azure Synapse in your project.

Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Related Content you might be interested in

Current Title:

Learn Azure Synapse Data Explorer

Limitless Analytics with Azure Synapse

Scalable Data Analytics with Azure Data Explorer

Cloud Analytics with Microsoft Azure.

When to use Azure Synapse Data Explorer