Book Image

Scalable Data Analytics with Azure Data Explorer

By : Jason Myerscough
Book Image

Scalable Data Analytics with Azure Data Explorer

By: Jason Myerscough

Overview of this book

Azure Data Explorer (ADX) enables developers and data scientists to make data-driven business decisions. This book will help you rapidly explore and query your data at scale and secure your ADX clusters. The book begins by introducing you to ADX, its architecture, core features, and benefits. You'll learn how to securely deploy ADX instances and navigate through the ADX Web UI, cover data ingestion, and discover how to query and visualize your data using the powerful Kusto Query Language (KQL). Next, you'll get to grips with KQL operators and functions to efficiently query and explore your data, as well as perform time series analysis and search for anomalies and trends in your data. As you progress through the chapters, you'll explore advanced ADX topics, including deploying your ADX instances using Infrastructure as Code (IaC). The book also shows you how to manage your cluster performance and monthly ADX costs by handling cluster scaling and data retention periods. Finally, you'll understand how to secure your ADX environment by restricting access with best practices for improving your KQL query performance. By the end of this Azure book, you'll be able to securely deploy your own ADX instance, ingest data from multiple sources, rapidly query your data, and produce reports with KQL and Power BI.
Table of Contents (18 chapters)
1
Section 1: Introduction to Azure Data Explorer
5
Section 2: Querying and Visualizing Your Data
11
Section 3: Advanced Azure Data Explorer Topics

Introducing workload groups

I remember working on a big data project where we had a wide range of end users and applications using our clusters. At one end of the spectrum, we had engineers executing ad hoc queries to analyze application logs, while at the other end, we had product management and customer support teams running complex reports by using integrations into third-party tools, such as Power BI, to gain insights into usage patterns and statistics. At the end of each month, the team would start to receive phone calls and tickets related to query and job performance. Users were complaining that their jobs were either not running or timing out. It turned out that the customer support team was running jobs and reports to generate billing information and that these jobs were resource-intensive and would consume all the resources, causing other jobs to be queued or time out. The only way to resolve the issue was to log into the cluster and kill the long-running tasks.

Managing...