Book Image

Scalable Data Analytics with Azure Data Explorer

By : Jason Myerscough
Book Image

Scalable Data Analytics with Azure Data Explorer

By: Jason Myerscough

Overview of this book

Azure Data Explorer (ADX) enables developers and data scientists to make data-driven business decisions. This book will help you rapidly explore and query your data at scale and secure your ADX clusters. The book begins by introducing you to ADX, its architecture, core features, and benefits. You'll learn how to securely deploy ADX instances and navigate through the ADX Web UI, cover data ingestion, and discover how to query and visualize your data using the powerful Kusto Query Language (KQL). Next, you'll get to grips with KQL operators and functions to efficiently query and explore your data, as well as perform time series analysis and search for anomalies and trends in your data. As you progress through the chapters, you'll explore advanced ADX topics, including deploying your ADX instances using Infrastructure as Code (IaC). The book also shows you how to manage your cluster performance and monthly ADX costs by handling cluster scaling and data retention periods. Finally, you'll understand how to secure your ADX environment by restricting access with best practices for improving your KQL query performance. By the end of this Azure book, you'll be able to securely deploy your own ADX instance, ingest data from multiple sources, rapidly query your data, and produce reports with KQL and Power BI.
Table of Contents (18 chapters)
1
Section 1: Introduction to Azure Data Explorer
5
Section 2: Querying and Visualizing Your Data
11
Section 3: Advanced Azure Data Explorer Topics

Azure Data Explorer use cases

Whenever someone asks what they should focus on when learning how to use Azure, I immediately say KQL. I use KQL daily, from managing cost and inventory to security and troubleshooting. It is not uncommon for relatively small environments to generate hundreds of GB of data per day, such as infrastructure diagnostics, Azure Resource Manager (ARM) audit logs, user audit logs, application logs, and application performance data. This may seem small in the grand scheme of things when, in 2021, we are generating quintillion bytes of data per day. But it is still enough data to require dedicated services such as ADX to analyze the data.

IoT monitoring and telemetry

Look around at your environment: how many appliances and devices can you see that are connected to the network? I see light bulbs, sensors, thermostats, and fire alarms, and there are billions of Internet of Things (IoT) devices in the world, all of which are constantly generating data. Together with Azure's IoT services, ADX can ingest the high volumes of data and enable us to monitor our things and perform complex time series analysis, so that we can identify anomalies and trends in our data.

Log analysis

Imagine this scenario: you have just performed a lift-and-shift migration to Azure for your on-premises product, and since the application is not a true cloud-native solution, you are constrained by which Azure services you can use, such as load balancing. Azure Application Gateway, which is a load-balancing service, supports cookie-based session affinity, and the cookies are completely managed by Application Gateway. The application we migrated to Azure required specific values to be written in the cookie, and this is not possible with the current version of Application Gateway, so we used HAProxy running on Linux virtual machines. The security team requires all products to only support TLS 1.2 and above. The problem is that not all of our clients support TLS 1.2, and if we simply disabled TLS 1.0 and 1.1, we would essentially break the service for those clients, which we do not want to do. Add to the equation the server-side product, which is distributed across 15 Azure Regions worldwide with each region containing hundreds of the HAProxy servers with no central logging! How can we analyze all this data to identify the clients that are not using TLS 1.2? The answer is Kusto.

We ingested the HAProxy log files and used KQL to analyze the log files and capture insights on TLS versioning and cipher information in seconds. With the queries, we were able to build near real-time dashboards for the support teams so they could reach out to clients and inform them when they would need to upgrade their software. With these insights, we were able to coordinate the TLS deprecation activities and execute them with no customer impact.

Most of the examples in this book focus on logging scenarios, and in Chapter 7, Identifying Patterns, Anomalies, and Trends in Your Data, we will learn about ADX's time series analysis features to identify patterns, anomalies, and trends in our data.