Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data. This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users. By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1 Introduction to Azure Synapse Data Explorer

Free Chapter

Chapter 1: Introducing Azure Synapse Data Explorer

Technical requirements

Understanding the lifecycle of data

The need for a fast and highly scalable data exploration service

What is Azure Synapse?

What is Azure Synapse Data Explorer?

Integrating Data Explorer pools with other Azure Synapse services

Exploring the Data Explorer pool infrastructure and scalability

What makes Azure Synapse Data Explorer unique?

When to use Azure Synapse Data Explorer

Summary

Chapter 2: Creating Your First Data Explorer Pool

Technical requirements

Creating a free Azure account

Creating an Azure Synapse workspace

Creating a Data Explorer pool using Azure Synapse Studio

Creating a Data Explorer pool using the Azure portal

Creating a Data Explorer pool using the Azure CLI

Summary

Chapter 3: Exploring Azure Synapse Studio

Technical requirements

Exploring the user interface of Azure Synapse Studio

Running your first query

Managing and monitoring Data Explorer pools

Monitoring Data Explorer pools

Summary

Chapter 4: Real-World Usage Scenarios

Technical requirements

Building a multi-purpose end-to-end analytics environment

Managing IoT data

Processing and analyzing geospatial data

Enabling real-time analytics with big data

Performing time series analytics

Summary

Part 2 Working with Data

Chapter 5: Ingesting Data into Data Explorer Pools

Technical requirements

Understanding the data loading process

Defining a retention policy

Choosing a data load strategy

Performing data ingestion

Summary

Chapter 6: Data Analysis and Exploration with KQL and Python

Technical requirements

Analyzing data with KQL

Exploring Data Explorer pool data with Python

Summary

Chapter 7: Data Visualization with Power BI

Technical requirements

Introduction to the Power BI integration

Creating a Power BI report

Adding data sources to your Power BI report

Connecting Power BI with your Azure Synapse workspace

Authoring Power BI reports from Azure Synapse Studio

Summary

Chapter 8: Building Machine Learning Experiments

Technical requirements

Understanding the application of ML

Introducing ML into your projects with AutoML

Exploring additional ML capabilities in Azure Synapse

Summary

Chapter 9: Exporting Data from Data Explorer Pools

Technical requirements

Understanding data export scenarios

Exporting data with client tools

Using server-side export to pull data

Performing robust exports with server-side data push

Configuring continuous data export

Summary

Part 3 Managing Azure Synapse Data Explorer

Chapter 10: System Monitoring and Diagnostics

Technical requirements

Monitoring your environment

Setting up alerts

Summary

Chapter 11: Tuning and Resource Management

Technical requirements

Implementing resource governance with workload groups

Speeding up queries using cache policies

Summary

Chapter 12: Securing Your Environment

Technical requirements

Security overview

Managing data encryption

Authenticating users

Configuring access to resources

Implementing network security

Protecting against external threats

Summary

Chapter 13: Advanced Data Management

Technical requirements

Managing extents

Purging personal data

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

What is Azure Synapse Data Explorer?

Before we talk about how Data Explorer is used in Azure Synapse, you may be asking, what is Azure Data Explorer anyways? Azure Data Explorer is a cloud-based big data platform that enables analytics on large volumes of data, on unstructured, semi-structured, and structured data, with high performance.

Azure Data Explorer comes from a tool built internally at Microsoft for the exploration of telemetry data, which was named Kusto. The French explorer Jacques Cousteau inspired the name. The query language it uses is called KQL. Microsoft still extensively uses Azure Data Explorer for telemetry data across its product teams.

At a high level, Azure Data Explorer has the following key features:

Data ingestion: Supports a series of diverse ways to ingest data, from managed pipelines (for example, Event Grid or IoT Hub), connectors and plugins (for example, Kafka Connect or Apache Spark connector), programmatic ingestion through software development kits (SDKs) or external data loading tools. It supports ingesting up to 200 MB of data per second, per cluster node, and load performance responds linearly as you scale the service in and out.
Time-series analysis: Azure Data Explorer is optimized for time-series analysis and processes thousands of time series in a few seconds.
Cost-effective queries and storage: Usage of Azure Data Explorer is charged by compute hours, not by queries, so you can stop your cluster when not in use. It is also charged by storage used. To save on compute hours, Azure Data Explorer supports auto-stop, to automatically stop your cluster after a certain time of inactivity—or you can stop it manually and start again when needed. On storage, Azure Data Explorer offers retention policies, so you can control how long you want to keep your data, also to optimize costs. For long-term storage or cold data, you can always store your data on Azure Storage.
Fast read-only query with high concurrency: Azure Data Explorer is a columnar store and offers fast text indexing. It allows you to retrieve data from a billion records in less than a second.
Fully managed and globally available: You do not need to worry about provisioning hardware, managing operating systems, patching, backup, or even the service infrastructure. Azure Data Explorer is a fully managed Platform-as-a-Service (PaaS) offering, so you only need to worry about your data. Also, it is globally available, allowing you to provision services closer to where your data is, reducing network latency and respecting data residency.
Enables custom solutions: Azure services such as Azure Monitor, Microsoft Sentinel, and others are built with Azure Data Explorer in their backend. You can leverage the service’s REST API and client libraries to build your custom solutions on top of Azure Data Explorer.

Note

This book explores Azure Synapse Data Explorer, and how it integrates with other Azure Synapse services. To learn more about the standalone service Azure Data Explorer and KQL, a good resource is Scalable Data Analytics with Azure Data Explorer, available at https://www.packtpub.com/product/scalable-data-analytics-with-azure-data-explorer/9781801078542.

Azure Synapse brings the standalone service Azure Data Explorer to Synapse workspaces, enabling you to complement SQL and Apache Spark pools with an interactive query experience optimized for log and telemetry data. As with dedicated SQL pools, Data Explorer pools are provisioned by you, and compute capacity is reserved while the pool is running. You select your desired cluster size based on your service-level requirements.

As expected, you can use Azure Synapse Studio to run queries on Data Explorer, resume and pause pools, manage the size of your pools by scaling up or down, and view details of your pool such as the instance count, CPU utilization, cache utilization, and more.

In Azure Synapse workspaces, when you navigate to the Develop hub, you create KQL scripts to explore data on Data Explorer pools. KQL has grown in popularity in recent years due to its adoption by other Azure services, such as Azure Monitor, Microsoft Sentinel, and others.

Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Related Content you might be interested in

Current Title:

Learn Azure Synapse Data Explorer

Limitless Analytics with Azure Synapse

Scalable Data Analytics with Azure Data Explorer

Cloud Analytics with Microsoft Azure.

What is Azure Synapse Data Explorer?