Book Image

Data Modeling for Azure Data Services

By : Peter ter Braake
Book Image

Data Modeling for Azure Data Services

By: Peter ter Braake

Overview of this book

Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation. Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory. By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.
Table of Contents (16 chapters)
1
Section 1 – Operational/OLTP Databases
8
Section 2 – Analytics with a Data Lake and Data Warehouse
13
Section 3 – ETL with Azure Data Factory

Using different file formats

Storage is said to be cheap nowadays. That does not mean that we should waste our money. When you store a lot of data and pay for the volume of data stored, it pays to compress your data.

When you use the on-demand options of Azure Databricks or Azure Synapse Analytics to process data in a data lake, it also pays to reduce the total duration of the processing. Both storage and processing are arguments to have a look at the different big data file formats that come from the Hadoop platform.

PolyBase in Synapse Analytics can work with delimited text files, ORC files, and Parquet files. Azure Data Factory can also work with AVRO files. Other processing platforms might even have other file types. AVRO, Parquet, and ORC evolved in Hadoop to decrease the cost of storage and compute. With the right file format, you can do the following:

  • Increase read performance
  • Increase write performance
  • Split files to get more parallelism
  • Add support...