Data Modeling with Snowflake

By : Serge Gershkovich

5 (2)

Buy this Book

Data Modeling with Snowflake

5 (2)

By: Serge Gershkovich

Buy this Book

Overview of this book

The Snowflake Data Cloud is one of the fastest-growing platforms for data warehousing and application workloads. Snowflake's scalable, cloud-native architecture and expansive set of features and objects enables you to deliver data solutions quicker than ever before. Yet, we must ensure that these solutions are developed using recommended design patterns and accompanied by documentation that’s easily accessible to everyone in the organization. This book will help you get familiar with simple and practical data modeling frameworks that accelerate agile design and evolve with the project from concept to code. These universal principles have helped guide database design for decades, and this book pairs them with unique Snowflake-native objects and examples like never before – giving you a two-for-one crash course in theory as well as direct application. By the end of this Snowflake book, you’ll have learned how to leverage Snowflake’s innovative features, such as time travel, zero-copy cloning, and change-data-capture, to create cost-effective, efficient designs through time-tested modeling principles that are easily digestible when coupled with real-world examples.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Core Concepts in Data Modeling and Snowflake Architecture

Free Chapter

Chapter 1: Unlocking the Power of Modeling

Technical requirements

Modeling with purpose

Leveraging the modeling toolkit

The benefits of database modeling

Operational and analytical modeling scenarios

A look at relational and transformational modeling

Summary

Further reading

References

Chapter 2: An Introduction to the Four Modeling Types

Logical

Summary

Chapter 3: Mastering Snowflake’s Architecture

Traditional architectures

Snowflake’s solution

Snowflake’s three-tier architecture

Snowflake’s features

Costs to consider

Saving cash by using cache

Summary

Further reading

Chapter 4: Mastering Snowflake Objects

Stages

Tables

Streams

Tasks

Summary

References

Chapter 5: Speaking Modeling through Snowflake Objects

Entities as tables

Attributes as columns

Constraints and enforcement

Identifiers as primary keys

Alternate keys as unique constraints

Relationships as foreign keys

Mandatory columns as NOT NULL constraints

Summary

Chapter 6: Seeing Snowflake’s Architecture through Modeling Notation

A history of relational modeling

RM versus entity-relationship diagram

Visual modeling conventions

The benefit of synchronized modeling

Summary

Part 2: Applied Modeling from Idea to Deployment

Chapter 7: Putting Conceptual Modeling into Practice

Embarking on conceptual design

Modeling in reverse

Summary

Further reading

Chapter 8: Putting Logical Modeling into Practice

Expanding from conceptual to logical modeling

Adding attributes

Cementing the relationships

Summary

Chapter 9: Database Normalization

An overview of database normalization

Data anomalies

Database normalization through examples

Data models on a spectrum of normalization

Summary

Chapter 10: Database Naming and Structure

Naming conventions

Organizing a Snowflake database

Summary

Chapter 11: Putting Physical Modeling into Practice

Technical requirements

Considerations before starting the implementation

Expanding from logical to physical modeling

Deploying a physical model

Creating an ERD from a physical model

Summary

Part 3: Solving Real-World Problems with Transformational Modeling

Chapter 12: Putting Transformational Modeling into Practice

Technical requirements

Separating the model from the object

Shaping transformations through relationships

Join elimination using constraints

Joins and set operators

Performance considerations and monitoring

Putting transformational modeling into practice

Summary

Chapter 13: Modeling Slowly Changing Dimensions

Technical requirements

Dimensions overview

Recipes for maintaining SCDs in Snowflake

Summary

Chapter 14: Modeling Facts for Rapid Analysis

Technical requirements

Fact table types

Fact table measures

Getting the facts straight

Maintaining fact tables using Snowflake features

Summary

Chapter 15: Modeling Semi-Structured Data

Technical requirements

The benefits of semi-structured data in Snowflake

Getting hands-on with semi-structured data

Schema-on-read != schema-no-need

Converting semi-structured data into relational data

Summary

Chapter 16: Modeling Hierarchies

Technical requirements

Understanding and distinguishing between hierarchies

Maintaining hierarchies in Snowflake

Summary

Chapter 17: Scaling Data Models through Modern Techniques

Technical requirements

Demystifying Data Vault 2.0

Modeling the data marts

Discovering Data Mesh

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Appendix

Technical requirements

The exceptional time traveler

The secret column type Snowflake refuses to document

Read the functional manual (RTFM)

Summary

Customer Reviews

5 (2)

5 star

100%

4 star

3 star

2 star

1 star

A look at relational and transformational modeling

The previous section describes how modeling varies between operational and data warehouse scenarios. Before exploring the modeling process in detail, it’s helpful to understand the look and feel of relational and transformational modeling and what we’re working toward. Before proceeding, it would help to summarize the main differences between transactional databases and data warehouses. You can see what these are in the following table:

Transactional Database	Data Warehouse
Supports daily operations	Provides operational insight
Operates on single records	Summarizes many records
Accurate as of the present instant	Historical snapshots over time
Single source of truth (SSOT), non-redundant	Redundant to support different analyses
Data models defined by business operations	Data models generated by business questions
Static and structured data model	Inherited structure and dynamically transformed data model
Single-application data	Multiple sources of converging data

Figure 1.4 – Common differences between transactional databases and warehouses

Given these differences, the following sections demonstrate what modeling looks like in each system and what it aims to achieve.

What modeling looks like in operational systems

Completely ignoring the modeling workflow that got us here, which will be covered in later chapters, we can observe an example of the type of modeling most commonly seen in transactional systems. The physical diagram in Figure 1.5 serves both as a blueprint for declaring the required tables and a guide to understanding their business context.

Following modeling conventions (don’t worry if they are still unfamiliar—they will be covered thoroughly in the coming chapters), we can infer a lot of information from this simple diagram. For example, a person is uniquely identified by an eight-digit identifier (the primary key) and must have a Social Security number (SSN), driver’s license, name, and birth date.

The one-to-many relationship between the two tables establishes that while a person does not necessarily need to have an account created, an account must belong to just one person:

Figure 1.5 – A physical model using crow’s foot notation

These details, combined with the list of attributes, data types, and constraints, not only dictate what kinds of data can be written to these tables but also provide an idea of how the business operates. So, how does this differ in analytical databases?

What modeling looks like in analytical systems

In a data warehouse scenario, the PERSON and ACCOUNT tables would not be defined from scratch—they would be extracted from the source in which they exist and loaded—bringing both structure and data into the process. Then, the analytical transformations begin in answer to the organization’s business questions. This is a process known as Extract Transform Load (ETL). (Although ELT has become the preferred processing order, the original term stuck.)

Suppose the management team wanted to analyze which age groups (by decade) were opening which account types and they wanted to store the result in a separate table for an independent analysis.

The following diagram shows the resulting relational model of an object obtained through transformational analysis but provides no business context:

Figure 1.6 – A relational model of a transformational requirement

Although physical modeling could describe such a table (as seen in Figure 1.6)—containing the account type with age and count of accounts as integers—such a model would fail to communicate the most relevant details, presented here:

The logic used to perform the analysis
The relationship between the source tables and the output

The business requirement for ACCOUNT_TYPE_AGE_ANALYSIS in this example purposely excludes the source key fields from the target table, preventing the possibility of establishing any relational links. However, the relational model still serves a vital role: it tells us how the sources are related and how to join them correctly to produce the required analysis.

The logic could then be constructed by joining PERSON and ACCOUNT, as shown here:

CREATE TABLE account_types_age_analysis AS
SELECT
    a.account_type,
    ROUND(DATEDIFF(years, p.birth_date, CURRENT_DATE()), -1
    ) AS age_decade,
    COUNT(a.account_id) AS total_accounts
    FROM account AS  a
     INNER JOIN person AS p
     ON a.person_id = p.person_id
GROUP BY     1, 2;

Although there is no relational connection between ACCOUNT_TYPE_AGE_ANALYSIS and its sources, there is still a clear dependency on them and their columns. Instead of using ERDs, which convey entities and relationships, transformational pipelines are visualized through a lineage diagram. This type of diagram gives a column-level mapping from source to target, including all intermediate steps, as shown here:

Figure 1.7 – Transformational modeling seen visually

Paired with the SQL logic used to construct it, the lineage graph gives a complete picture of the transformational relationship between sources and targets in an analytical/warehousing scenario.

Having witnessed both relational and analytical approaches to modeling, it is clear that both play a vital role in navigating the complex dynamic environments that one is liable to encounter in an enterprise-scale Snowflake environment.

Although we have only skimmed the surface of what modeling entails and the unique features of the Snowflake platform that can be leveraged to this end, this chapter has hopefully given you an idea of the vital role that modeling plays in building, maintaining, and documenting database systems. Before diving into the specifics of verbal, technical, and visual modeling semantics of modeling in the chapters to come, let’s review what we learned.

Data Modeling with Snowflake

By : Serge Gershkovich

Data Modeling with Snowflake

By: Serge Gershkovich

Overview of this book

Related Content you might be interested in

Current Title:

Data Modeling with Snowflake

Snowflake Cookbook

Data Modeling for Azure Data Services

Data Engineering with dbt

A look at relational and transformational modeling

What modeling looks like in operational systems

What modeling looks like in analytical systems