Book Image

Data Modeling for Azure Data Services

By : Peter ter Braake
Book Image

Data Modeling for Azure Data Services

By: Peter ter Braake

Overview of this book

Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation. Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory. By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.
Table of Contents (16 chapters)
1
Section 1 – Operational/OLTP Databases
8
Section 2 – Analytics with a Data Lake and Data Warehouse
13
Section 3 – ETL with Azure Data Factory

Understanding big data clusters

A really important part of working with modern data solutions is the scalability of the solution. Scalability determines how well a system will keep functioning when we experience growth. Growth can mean any or all of the following:

  • The system needs to handle more concurrent users.
  • The volume of the data we need to store increases.
  • The compute power needed increases because the query complexity increases.

The last two points are about being able to utilize more hardware resources. The main resources we need to consider are compute and storage. Compute refers to the number of CPU cores being used. Storage can mean storing data on actual hard drives or storing data in memory. In the end, data must always be stored on hard drives.

Hardware scalability is about adding more hardware resources to our database. The second part of scalability is to do with whether or not our database will actually benefit from extra hardware. This is...