Book Image

Data Modeling with Snowflake

By : Serge Gershkovich
5 (2)
Book Image

Data Modeling with Snowflake

5 (2)
By: Serge Gershkovich

Overview of this book

The Snowflake Data Cloud is one of the fastest-growing platforms for data warehousing and application workloads. Snowflake's scalable, cloud-native architecture and expansive set of features and objects enables you to deliver data solutions quicker than ever before. Yet, we must ensure that these solutions are developed using recommended design patterns and accompanied by documentation that’s easily accessible to everyone in the organization. This book will help you get familiar with simple and practical data modeling frameworks that accelerate agile design and evolve with the project from concept to code. These universal principles have helped guide database design for decades, and this book pairs them with unique Snowflake-native objects and examples like never before – giving you a two-for-one crash course in theory as well as direct application. By the end of this Snowflake book, you’ll have learned how to leverage Snowflake’s innovative features, such as time travel, zero-copy cloning, and change-data-capture, to create cost-effective, efficient designs through time-tested modeling principles that are easily digestible when coupled with real-world examples.
Table of Contents (24 chapters)
1
Part 1: Core Concepts in Data Modeling and Snowflake Architecture
8
Part 2: Applied Modeling from Idea to Deployment
14
Part 3: Solving Real-World Problems with Transformational Modeling

Getting hands-on with semi-structured data

Although we will query semi-structured JSON data as part of this exercise, its storage still conforms to modeling best practices such as naming and standard columns. In this example, we will use semi-structured data containing information about pirates – such as details about the crew, weapons, and their ship – all stored in a single VARIANT data type. With relational data, a row represents a single entity; in semi-structured data, a row is an entire file (although the file itself can contain single or countless entities). For this reason, metadata columns to mark individual loads and source filenames are stored alongside VARIANT.

Figure 15.1 – A table with ELT meta columns and VARIANT for storing semi-structured data

Figure 15.1 – A table with ELT meta columns and VARIANT for storing semi-structured data

This example uses AUTOINCREMENT (a.k.a. IDENTITY) as the default to generate a sequential unique ID for each load/record inserted.

In a real-world scenario, semi-structured...