Snowflake Cookbook

By : Hamid Mahmood Qureshi, Hammad Sharif

Snowflake Cookbook

By: Hamid Mahmood Qureshi, Hammad Sharif

Overview of this book

Snowflake is a unique cloud-based data warehousing platform built from scratch to perform data management on the cloud. This book introduces you to Snowflake's unique architecture, which places it at the forefront of cloud data warehouses. You'll explore the compute model available with Snowflake, and find out how Snowflake allows extensive scaling through the virtual warehouses. You will then learn how to configure a virtual warehouse for optimizing cost and performance. Moving on, you'll get to grips with the data ecosystem and discover how Snowflake integrates with other technologies for staging and loading data. As you progress through the chapters, you will leverage Snowflake's capabilities to process a series of SQL statements using tasks to build data pipelines and find out how you can create modern data solutions and pipelines designed to provide high performance and scalability. You will also get to grips with creating role hierarchies, adding custom roles, and setting default roles for users before covering advanced topics such as data sharing, cloning, and performance optimization. By the end of this Snowflake book, you will be well-versed in Snowflake's architecture for building modern analytical solutions and understand best practices for solving commonly faced problems using practical recipes.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the color images

Conventions used

Sections

Get in touch

Reviews

Chapter 1: Getting Started with Snowflake

Technical requirements

Creating a new Snowflake instance

Creating a tailored multi-cluster virtual warehouse

Using the Snowflake WebUI and executing a query

Using SnowSQL to connect to Snowflake

Connecting to Snowflake with JDBC

Creating a new account admin user and understanding built-in roles

Free Chapter

Chapter 2: Managing the Data Life Cycle

Technical requirements

Managing a database

Managing a schema

Managing tables

Managing external tables and stages

Managing views in Snowflake

Chapter 3: Loading and Extracting Data into and out of Snowflake

Technical requirements

Configuring Snowflake access to private S3 buckets

Loading delimited bulk data into Snowflake from cloud storage

Loading delimited bulk data into Snowflake from your local machine

Loading Parquet files into Snowflake

Making sense of JSON semi-structured data and transforming to a relational view

Processing newline-delimited JSON (or NDJSON) into a Snowflake table

Processing near real-time data into a Snowflake table using Snowpipe

Extracting data from Snowflake

Chapter 4: Building Data Pipelines in Snowflake

Technical requirements

Creating and scheduling a task

Conjugating pipelines through a task tree

Querying and viewing the task history

Exploring the concept of streams to capture table-level changes

Combining the concept of streams and tasks to build pipelines that process changed data on a schedule

Converting data types and Snowflake's failure management

Managing context using different utility functions

Chapter 5: Data Protection and Security in Snowflake

Technical requirements

Setting up custom roles and completing the role hierarchy

Configuring and assigning a default role to a user

Delineating user management from security and role management

Configuring custom roles for managing access to highly secure data

Setting up development, testing, pre-production, and production database hierarchies and roles

Safeguarding the ACCOUNTADMIN role and users in the ACCOUNTADMIN role

Chapter 6: Performance and Cost Optimization

Technical requirements

Examining table schemas and deriving an optimal structure for a table

Identifying query plans and bottlenecks

Weeding out inefficient queries through analysis

Identifying and reducing unnecessary Fail-safe and Time Travel storage usage

Projections in Snowflake for performance

Reviewing query plans to modify table clustering

Optimizing virtual warehouse scale

Chapter 7: Secure Data Sharing

Technical requirements

Sharing a table with another Snowflake account

Sharing data through a view with another Snowflake account

Sharing a complete database with another Snowflake account and setting up future objects to be shareable

Creating reader accounts and configuring them for non-Snowflake sharing

Keeping costs in check when sharing data with non-Snowflake users

Chapter 8: Back to the Future with Time Travel

Technical requirements

Using Time Travel to return to the state of data at a particular time

Using Time Travel to recover from the accidental loss of table data

Identifying dropped databases, tables, and other objects and restoring them using Time Travel

Using Time Travel in conjunction with cloning to improve debugging

Using cloning to set up new environments based on the production environment rapidly

Chapter 9: Advanced SQL Techniques

Technical requirements

Managing timestamp data

Shredding date data to extract Calendar information

Unique counts and Snowflake

Managing transactions in Snowflake

Ordered analytics over window frames

Generating sequences in Snowflake

Chapter 10: Extending Snowflake Capabilities

Technical requirements

Creating a Scalar user-defined function using SQL

Creating a Table user-defined function using SQL

Creating a Scalar user-defined function using JavaScript

Creating a Table user-defined function using JavaScript

Connecting Snowflake with Apache Spark

Using Apache Spark to prepare data for storage on Snowflake

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

What this book covers

Chapter 1, Getting Started with Snowflake, walks you through the process of setting up an account and connecting with Snowflake via its Web UI. The chapter introduces the concept of a virtual warehouse and how a virtual warehouse can be used elastically according to the user's needs around the complexity of user queries. Then the chapter moves onto ecosystem support for Snowflake, showing how Snowflake can be connected and used with different tools using different interfaces.

Chapter 2, Managing the Data Life Cycle, provides a set of recipes that introduce how data can be managed with Snowflake. The chapter talks about typical DBMSes and data warehouse concepts and introduces nuances specific to Snowflake.

Chapter 3, Loading and Extracting Data into and out of Snowflake, guides you on staging and loading data. In this chapter, we explore methods provided by Snowflake to load data into a Snowflake table. We explore the loading of data from familiar cloud sources such as Amazon S3, Azure Blob Storage, and GCP Cloud Storage, and on-premise hosts. The chapter also provides an example of how near-real-time data loading works on Snowflake.

Chapter 4, Building Data Pipelines in Snowflake, explains the capabilities on offer with Snowflake to process a string of SQL statements through the concept of tasks. Tasks allow developers to create data pipelines that process data and perform various functions as they progress through the execution sequence. Tasks combined with the concept of streams enables the user to manage complex orchestration and scheduling patterns. This chapter deals with setting up pipelines using tasks and streams and applying different techniques for transforming data within tasks.

Chapter 5, Data Protection and Security in Snowflake, walks you through handling authentication and authorization on Snowflake. Authentication refers to letting a user connect, and authorization refers to verifying what objects a connected user can access. Snowflake provides granular controls to limit access for out-of-the-box and custom roles. This chapter will help you set up role hierarchies, add custom roles, and set default roles for users.

Chapter 6, Performance and Cost Optimization, enables you to exploit Snowflake's capabilities to optimize queries and performance through various built-in features such as caching, autoscaling, and automatically clustering tables. There is always an opportunity to positively influence the performance by tweaking table structures, introducing physicalization techniques, and optimizing your compute resources to the maximum. In this chapter, we explore some of these techniques, which can be used to make a Snowflake-based data warehouse run more efficiently and therefore at a lower cost. The chapter also explores optimization strategies for reducing unnecessary storage costs.

Chapter 7, Secure Data Sharing, details how to share data with other Snowflake customers as well as non-Snowflake customers. Traditional warehouse solutions share data by extracting the data out and sending it over transport mechanisms, compromising data security and leading to inefficiencies. Another downside to this is that as soon as the data is extracted, it is already out of date. Snowflake overcomes this by providing a unique data sharing solution that ensures reduced costs, reduced operational overhead, and always up-to-date data.

Chapter 8, Back to the Future with Time Travel, equips you to deal with unpleasant data issues such as not determining when data was changed or whether the data has been lost altogether. Snowflake provides a unique way of going back in time through the Time Travel feature. This chapter explores the various applications of the Time Travel feature and combines it with cloning to tackle common data loss and debugging issues.

Chapter 9, Advanced SQL Techniques, provides you with multiple advanced SQL techniques using a Snowflake data warehouse. These SQL techniques are essential from a data warehousing perspective, such as trend analysis, temporal analytics, managing sequences, unique counts, and managing processes as transactions.

Chapter 10, Extending Snowflake's Capabilities, provides you with techniques for extending a Snowflake data warehouse and integrating it to coexist and work with other technologies. The chapter walks you through user-defined functions, through which custom functionality can be achieved. The chapter also involves connecting Snowflake with Apache Spark and demonstrates how to perform data processing on the Spark engine.

Snowflake Cookbook

By : Hamid Mahmood Qureshi, Hammad Sharif

Snowflake Cookbook

By: Hamid Mahmood Qureshi, Hammad Sharif

Overview of this book

Related Content you might be interested in

Current Title:

Snowflake Cookbook

Data Modeling with Snowflake

Amazon Redshift Cookbook

What this book covers