Book Image

Snowflake Cookbook

By : Hamid Mahmood Qureshi, Hammad Sharif
Book Image

Snowflake Cookbook

By: Hamid Mahmood Qureshi, Hammad Sharif

Overview of this book

Snowflake is a unique cloud-based data warehousing platform built from scratch to perform data management on the cloud. This book introduces you to Snowflake's unique architecture, which places it at the forefront of cloud data warehouses. You'll explore the compute model available with Snowflake, and find out how Snowflake allows extensive scaling through the virtual warehouses. You will then learn how to configure a virtual warehouse for optimizing cost and performance. Moving on, you'll get to grips with the data ecosystem and discover how Snowflake integrates with other technologies for staging and loading data. As you progress through the chapters, you will leverage Snowflake's capabilities to process a series of SQL statements using tasks to build data pipelines and find out how you can create modern data solutions and pipelines designed to provide high performance and scalability. You will also get to grips with creating role hierarchies, adding custom roles, and setting default roles for users before covering advanced topics such as data sharing, cloning, and performance optimization. By the end of this Snowflake book, you will be well-versed in Snowflake's architecture for building modern analytical solutions and understand best practices for solving commonly faced problems using practical recipes.
Table of Contents (12 chapters)

Conjugating pipelines through a task tree

In this recipe, we will connect multiple tasks together in a tree to produce a data pipeline that performs multiple functions as it executes.

Getting ready

The following steps describe the various ways to create and schedule a task. Note that these steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

To demonstrate the concept of a task tree, we will first create an aggregation query that we assume is being used in a report. We are assuming that the query takes a long time to run, therefore we are going to save the results of the query to a physical table and then refresh it periodically through a scheduled task. The steps are as follows:

  1. To simplify the process for you, we have used the sample data provided by Snowflake and created an aggregation query on top of that. (Please note that sample data is included with your Snowflake instance and can be found under the SNOWFLAKE_SAMPLE_DATA...