Book Image

Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha
Book Image

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data. This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users. By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.
Table of Contents (19 chapters)
1
Part 1 Introduction to Azure Synapse Data Explorer
6
Part 2 Working with Data
12
Part 3 Managing Azure Synapse Data Explorer

What is Azure Synapse Data Explorer?

Before we talk about how Data Explorer is used in Azure Synapse, you may be asking, what is Azure Data Explorer anyways? Azure Data Explorer is a cloud-based big data platform that enables analytics on large volumes of data, on unstructured, semi-structured, and structured data, with high performance.

Azure Data Explorer comes from a tool built internally at Microsoft for the exploration of telemetry data, which was named Kusto. The French explorer Jacques Cousteau inspired the name. The query language it uses is called KQL. Microsoft still extensively uses Azure Data Explorer for telemetry data across its product teams.

At a high level, Azure Data Explorer has the following key features:

  • Data ingestion: Supports a series of diverse ways to ingest data, from managed pipelines (for example, Event Grid or IoT Hub), connectors and plugins (for example, Kafka Connect or Apache Spark connector), programmatic ingestion through software development kits (SDKs) or external data loading tools. It supports ingesting up to 200 MB of data per second, per cluster node, and load performance responds linearly as you scale the service in and out.
  • Time-series analysis: Azure Data Explorer is optimized for time-series analysis and processes thousands of time series in a few seconds.
  • Cost-effective queries and storage: Usage of Azure Data Explorer is charged by compute hours, not by queries, so you can stop your cluster when not in use. It is also charged by storage used. To save on compute hours, Azure Data Explorer supports auto-stop, to automatically stop your cluster after a certain time of inactivity—or you can stop it manually and start again when needed. On storage, Azure Data Explorer offers retention policies, so you can control how long you want to keep your data, also to optimize costs. For long-term storage or cold data, you can always store your data on Azure Storage.
  • Fast read-only query with high concurrency: Azure Data Explorer is a columnar store and offers fast text indexing. It allows you to retrieve data from a billion records in less than a second.
  • Fully managed and globally available: You do not need to worry about provisioning hardware, managing operating systems, patching, backup, or even the service infrastructure. Azure Data Explorer is a fully managed Platform-as-a-Service (PaaS) offering, so you only need to worry about your data. Also, it is globally available, allowing you to provision services closer to where your data is, reducing network latency and respecting data residency.
  • Enables custom solutions: Azure services such as Azure Monitor, Microsoft Sentinel, and others are built with Azure Data Explorer in their backend. You can leverage the service’s REST API and client libraries to build your custom solutions on top of Azure Data Explorer.

Note

This book explores Azure Synapse Data Explorer, and how it integrates with other Azure Synapse services. To learn more about the standalone service Azure Data Explorer and KQL, a good resource is Scalable Data Analytics with Azure Data Explorer, available at https://www.packtpub.com/product/scalable-data-analytics-with-azure-data-explorer/9781801078542.

Azure Synapse brings the standalone service Azure Data Explorer to Synapse workspaces, enabling you to complement SQL and Apache Spark pools with an interactive query experience optimized for log and telemetry data. As with dedicated SQL pools, Data Explorer pools are provisioned by you, and compute capacity is reserved while the pool is running. You select your desired cluster size based on your service-level requirements.

As expected, you can use Azure Synapse Studio to run queries on Data Explorer, resume and pause pools, manage the size of your pools by scaling up or down, and view details of your pool such as the instance count, CPU utilization, cache utilization, and more.

In Azure Synapse workspaces, when you navigate to the Develop hub, you create KQL scripts to explore data on Data Explorer pools. KQL has grown in popularity in recent years due to its adoption by other Azure services, such as Azure Monitor, Microsoft Sentinel, and others.