Book Image

Cloud Scale Analytics with Azure Data Services

By : Patrik Borosch
Book Image

Cloud Scale Analytics with Azure Data Services

By: Patrik Borosch

Overview of this book

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality. This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs. By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.
Table of Contents (20 chapters)
1
Section 1: Data Warehousing and Considerations Regarding Cloud Computing
4
Section 2: The Storage Layer
7
Section 3: Cloud-Scale Data Integration and Data Transformation
14
Section 4: Data Presentation, Dashboarding, and Distribution

Talking about partitioning

When you need to load massive amounts of data to your database, partitioning might be another optimization option. But you really should be confronted with massive amounts of data when you start considering partitioning.

Do you remember the math of the CCI and why it will only perform when you load around 63 to 100 million rows to your database (see the preceding section, Understanding CCI)? Now, you need to add another factor to this equation: the number of partitions that you are planning for your database.

Let's assume that you want to have one partition for every month (the most typical usage of partitions) in your table, and you plan to load data for 5 years to your database. This will add another 60 as a factor to your preceding term: 60 distributions x 1,048,578 rows per distribution x 60 months in the database. This results in 3,774,880,800 rows that your table needs to hold as a minimum in order for the CCI to be built over all the partitions...