Data Modeling for Azure Data Services

By : Peter ter Braake

Data Modeling for Azure Data Services

By: Peter ter Braake

Overview of this book

Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation. Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory. By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1 – Operational/OLTP Databases

Free Chapter

Chapter 1: Introduction to Databases

Overview of relational databases

Introduction to Structured Query Language

Impact of intended usage patterns on database design

Understanding relational theory

Keys

Types of workload

Summary

Chapter 2: Entity Analysis

Scope

Understanding entity relationship diagrams

Entities

Relationships

Creating your first ERD

Context of an ERD

Summary

Exercises

Chapter 3: Normalizing Data

When to use normalization as a design strategy

Preventing redundancy

The normalization steps

An alternative approach to normalizing data

Integrating separate results

Entity relationship diagram

Summary

Exercises

Chapter 4: Provisioning and Implementing an Azure SQL DB

Technical requirements

Understanding SQL Server data types

Quantifying the data model

Provisioning an Azure SQL database

Connecting to the database

Data definition language

Inserting data

Indexing

Summary

Chapter 5: Designing a NoSQL Database

Understanding big data

Understanding big data clusters

Getting to know Cosmos DB

Key-value databases

Other NoSQL databases

Extra considerations

Summary

Exercise

Chapter 6: Provisioning and Implementing an Azure Cosmos DB Database

Technical requirements

Provisioning a Cosmos DB database

Creating a container

Uploading documents to a container

Cosmos DB container settings

Importing data using the Azure Cosmos DB Data Migration tool

Summary

Section 2 – Analytics with a Data Lake and Data Warehouse

Chapter 7: Dimensional Modeling

Background to dimensional modeling

Understanding dimensional modeling

Steps in dimensional modeling

Designing dimensions

Designing fact tables

Using a Kimball data warehouse versus data marts

Summary

Exercise

Chapter 8: Provisioning and Implementing an Azure Synapse SQL Pool

Overview of Synapse Analytics

Provisioning a Synapse Analytics workspace

Creating a dedicated SQL pool

Implementing tables in Synapse SQL pools

Understanding workload management

Using PolyBase to load data

Connecting to and using a dedicated SQL pool

Summary

Chapter 9: Data Vault Modeling

Background to Data Vault modeling

Designing Hub tables

Designing Link tables

Designing Satellite tables

Using hash keys

Designing a Data Vault structure

Designing business vaults

Implementing a Data Vault

Summary

Exercise

Chapter 10: Designing and Implementing a Data Lake Using Azure Storage

Technical requirements

Background of data lakes

Modeling a data lake

Using different file formats

Choosing the proper file size

Provisioning an Azure storage account

Creating a data lake filesystem

Creating multiple storage accounts

Summary

Section 3 – ETL with Azure Data Factory

Chapter 11: Implementing ETL Using Azure Data Factory

Technical requirements

Introducing Azure Data Factory

Introducing the main components of Azure Data Factory

Using the copy activity

Implementing a data flow

Executing SQL code from Data Factory

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Designing dimensions

The first thing to look at is the primary key to use for a dimension table.

Defining the primary key of a dimension table

To get straight to the point: we always use surrogate keys for dimension tables. In Chapter 1, Introduction to Databases, we discussed logical versus surrogate keys. We will not repeat the discussion here. The best practice is to use surrogate keys for dimension tables.

In a star schema database model, using an efficient primary key is even more important than in a normalized OLTP database. In earlier examples, it became clear that fact tables might become really big in terms of the number of rows they store. Suppose you have a fact table with seven dimensions that has 1 billion rows. The difference between using keys that are 4 bytes in size and keys that are 8 bytes in size is 7 x 4 x 1,000,000,000, which is 28 GB. Some people might argue that today 28 GB is not really something to consider. But you might have a lot more rows than...

Data Modeling for Azure Data Services

By : Peter ter Braake

Data Modeling for Azure Data Services

By: Peter ter Braake

Overview of this book

Related Content you might be interested in

Current Title:

Data Modeling for Azure Data Services

Cloud Scale Analytics with Azure Data Services

Limitless Analytics with Azure Synapse

Azure Synapse Analytics Cookbook

Designing dimensions

Defining the primary key of a dimension table