Book Image

Engineering Data Mesh in Azure Cloud

By : Aniruddha Deswandikar

Book Image

Engineering Data Mesh in Azure Cloud

By: Aniruddha Deswandikar

Overview of this book

Decentralizing data and centralizing governance are practical, scalable, and modern approaches to data analytics. However, implementing a data mesh can feel like changing the engine of a moving car. Most organizations struggle to start and get caught up in the concept of data domains, spending months trying to organize domains. This is where Engineering Data Mesh in Azure Cloud can help. The book starts by assessing your existing framework before helping you architect a practical design. As you progress, you’ll focus on the Microsoft Cloud Adoption Framework for Azure and the cloud-scale analytics framework, which will help you quickly set up a landing zone for your data mesh in the cloud. The book also resolves common challenges related to the adoption and implementation of a data mesh faced by real customers. It touches on the concepts of data contracts and helps you build practical data contracts that work for your organization. The last part of the book covers some common architecture patterns used for modern analytics frameworks such as artificial intelligence (AI). By the end of this book, you’ll be able to transform existing analytics frameworks into a streamlined data mesh using Microsoft Azure, thereby navigating challenges and implementing advanced architecture patterns for modern analytics workloads.

Preface

Who this book is for

What this book covers

To get the most out of this book

Access the GitHub repository

Conventions used

Share your thoughts

Download a free PDF copy of this book

Free Chapter

Part 1: Rolling Out the Data Mesh in the Azure Cloud

Part 1: Rolling Out the Data Mesh in the Azure Cloud

Chapter 1: Introducing Data Meshes

Chapter 1: Introducing Data Meshes

Exploring the evolution of modern data analytics

Discovering the challenges of modern-day enterprises

The data mesh solution

Chapter 2: Building a Data Mesh Strategy

Chapter 2: Building a Data Mesh Strategy

Is a data mesh for everybody?

Aligning your analytics strategy with your business strategy

Understanding data maturity models

Building the technology stack

The analytics team

Data governance

Approaches to building your data mesh

Chapter 3: Deploying a Data Mesh Using the Azure Cloud-Scale Analytics Framework

Chapter 3: Deploying a Data Mesh Using the Azure Cloud-Scale Analytics Framework

Introduction to Azure CSA

Understanding landing zones

Organizing resources

Designing a cloud management structure

Diving deeper into landing zones in CSA

Automating landing zone deployment

Organizing resources in a landing zone

Networking topologies

Security and access control

Streamlining deployment through DevOps

Chapter 4: Building a Data Mesh Governance Framework Using Microsoft Azure Services

Chapter 4: Building a Data Mesh Governance Framework Using Microsoft Azure Services

Data mesh governance requirements

Collecting and managing metadata

Monitoring and managing data quality

Implementing data observability

Chapter 5: Security Architecture for Data Meshes

Chapter 5: Security Architecture for Data Meshes

Understanding the security requirements of data mesh architecture

Understanding authentication and authorization in Azure

Managing data access

Managing data privacy

Chapter 6: Automating Deployment through Azure Resource Manager and Azure DevOps

Chapter 6: Automating Deployment through Azure Resource Manager and Azure DevOps

Azure Resource Manager templates for landing zones

Source code control for ARM templates

Azure DevOps pipelines for deploying infrastructure

Chapter 7: Building a Self-Service Portal for Common Data Mesh Operations

Chapter 7: Building a Self-Service Portal for Common Data Mesh Operations

Why do we need a self-service portal?

Gathering requirements for the self-service portal

Requesting landing zones or data products

Hosting common data pipeline templates

Other common features of a self-service portal

Architecting the self-service portal

Part 2: Practical Challenges of Implementing a Data Mesh

Part 2: Practical Challenges of Implementing a Data Mesh

Chapter 8: How to Design, Build, and Manage Data Contracts

Chapter 8: How to Design, Build, and Manage Data Contracts

What are data contracts?

What are the contents of a data contract?

Who creates and owns a data contract?

Who consumes the data contract?

How do we store data and access contracts?

How do we link data contracts to data consumption or pipelines?

Chapter 9: Data Quality Management

Chapter 9: Data Quality Management

Why is data quality important?

How is data quality defined?

How to manage data quality

Data quality management systems

Build versus buy

Popular data quality frameworks and tools

Chapter 10: Master Data Management

Chapter 10: Master Data Management

Single source of truth

What causes discrepancies in master data?

MDM design patterns

MDM architecture for a data mesh

Build versus buy

Popular MDM tools

Chapter 11: Monitoring and Data Observability

Chapter 11: Monitoring and Data Observability

Piecing it all together – the importance of data mesh monitoring and data observability

How data mesh monitoring differs

Baking diagnostic logging into the landing zone templates

Designing a data mesh operations center

Tooling for the DMOC

Data observability

Setting up alerts

Piecing it all together

Chapter 12: Monitoring Data Mesh Costs and Building a Cross-Charging Model

Chapter 12: Monitoring Data Mesh Costs and Building a Cross-Charging Model

Components of data mesh costs

Cost models in a data mesh

Overview of cost management in Azure

Allocating costs to different data product groups and domains

Chapter 13: Understanding Data-Sharing Topologies in a Data Mesh

Chapter 13: Understanding Data-Sharing Topologies in a Data Mesh

What is in-place sharing?

Understanding data-sharing challenges in a data mesh

Exploring different methods available for sharing data

Picking the right data-sharing topologies

Part 3: Popular Data Product Architectures

Part 3: Popular Data Product Architectures

Chapter 14: Advanced Analytics Using Azure Machine Learning, Databricks, and the Lakehouse Architecture

Chapter 14: Advanced Analytics Using Azure Machine Learning, Databricks, and the Lakehouse Architecture

Chapter 15: Big Data Analytics Using Azure Synapse Analytics

Chapter 15: Big Data Analytics Using Azure Synapse Analytics

Chapter 16: Event-Driven Analytics Using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning

Chapter 16: Event-Driven Analytics Using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning

Chapter 17: AI Using Azure Cognitive Services and Azure OpenAI

Chapter 17: AI Using Azure Cognitive Services and Azure OpenAI

Data flow/interactions

Index

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share your thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Collecting and managing metadata

In the previous section, we looked at how data can be cataloged using Microsoft Purview. The built-in Microsoft Purview scanners scan and ingest basic technical metadata from data sources. This includes file types, column names, column types, and basic out-of-the-box classifications. However, this initial technical metadata is extracted from the data source purely based on the definitions available in the data source itself. Some data sources, such as Microsoft SQL Server, maintain significant amounts of data relating to the schema and its relationships. But others, such as CSV files stored in blob storage, do not have any information other than a column header. Hence, after the initial scan and ingest cycle, the governance team needs to get to work editing and enhancing the metadata to make the data assets more meaningful.

The real advantage of cataloging data and making it searchable is to make data more meaningful to the users. Users searching...