Book Image

Engineering Data Mesh in Azure Cloud

By : Aniruddha Deswandikar
Book Image

Engineering Data Mesh in Azure Cloud

By: Aniruddha Deswandikar

Overview of this book

Decentralizing data and centralizing governance are practical, scalable, and modern approaches to data analytics. However, implementing a data mesh can feel like changing the engine of a moving car. Most organizations struggle to start and get caught up in the concept of data domains, spending months trying to organize domains. This is where Engineering Data Mesh in Azure Cloud can help. The book starts by assessing your existing framework before helping you architect a practical design. As you progress, you’ll focus on the Microsoft Cloud Adoption Framework for Azure and the cloud-scale analytics framework, which will help you quickly set up a landing zone for your data mesh in the cloud. The book also resolves common challenges related to the adoption and implementation of a data mesh faced by real customers. It touches on the concepts of data contracts and helps you build practical data contracts that work for your organization. The last part of the book covers some common architecture patterns used for modern analytics frameworks such as artificial intelligence (AI). By the end of this book, you’ll be able to transform existing analytics frameworks into a streamlined data mesh using Microsoft Azure, thereby navigating challenges and implementing advanced architecture patterns for modern analytics workloads.
Table of Contents (23 chapters)
Free Chapter
1
Part 1: Rolling Out the Data Mesh in the Azure Cloud
9
Part 2: Practical Challenges of Implementing a Data Mesh
16
Part 3: Popular Data Product Architectures
17
Chapter 14: Advanced Analytics Using Azure Machine Learning, Databricks, and the Lakehouse Architecture
19
Chapter 16: Event-Driven Analytics Using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning

Collecting and managing metadata

In the previous section, we looked at how data can be cataloged using Microsoft Purview. The built-in Microsoft Purview scanners scan and ingest basic technical metadata from data sources. This includes file types, column names, column types, and basic out-of-the-box classifications. However, this initial technical metadata is extracted from the data source purely based on the definitions available in the data source itself. Some data sources, such as Microsoft SQL Server, maintain significant amounts of data relating to the schema and its relationships. But others, such as CSV files stored in blob storage, do not have any information other than a column header. Hence, after the initial scan and ingest cycle, the governance team needs to get to work editing and enhancing the metadata to make the data assets more meaningful.

The real advantage of cataloging data and making it searchable is to make data more meaningful to the users. Users searching...