Book Image

Snowflake Cookbook

By : Hamid Mahmood Qureshi, Hammad Sharif
Book Image

Snowflake Cookbook

By: Hamid Mahmood Qureshi, Hammad Sharif

Overview of this book

Snowflake is a unique cloud-based data warehousing platform built from scratch to perform data management on the cloud. This book introduces you to Snowflake's unique architecture, which places it at the forefront of cloud data warehouses. You'll explore the compute model available with Snowflake, and find out how Snowflake allows extensive scaling through the virtual warehouses. You will then learn how to configure a virtual warehouse for optimizing cost and performance. Moving on, you'll get to grips with the data ecosystem and discover how Snowflake integrates with other technologies for staging and loading data. As you progress through the chapters, you will leverage Snowflake's capabilities to process a series of SQL statements using tasks to build data pipelines and find out how you can create modern data solutions and pipelines designed to provide high performance and scalability. You will also get to grips with creating role hierarchies, adding custom roles, and setting default roles for users before covering advanced topics such as data sharing, cloning, and performance optimization. By the end of this Snowflake book, you will be well-versed in Snowflake's architecture for building modern analytical solutions and understand best practices for solving commonly faced problems using practical recipes.
Table of Contents (12 chapters)

What this book covers

Chapter 1, Getting Started with Snowflake, walks you through the process of setting up an account and connecting with Snowflake via its Web UI. The chapter introduces the concept of a virtual warehouse and how a virtual warehouse can be used elastically according to the user's needs around the complexity of user queries. Then the chapter moves onto ecosystem support for Snowflake, showing how Snowflake can be connected and used with different tools using different interfaces.

Chapter 2, Managing the Data Life Cycle, provides a set of recipes that introduce how data can be managed with Snowflake. The chapter talks about typical DBMSes and data warehouse concepts and introduces nuances specific to Snowflake.

Chapter 3, Loading and Extracting Data into and out of Snowflake, guides you on staging and loading data. In this chapter, we explore methods provided by Snowflake to load data into a Snowflake table. We explore the loading of data from familiar cloud sources such as Amazon S3, Azure Blob Storage, and GCP Cloud Storage, and on-premise hosts. The chapter also provides an example of how near-real-time data loading works on Snowflake.

Chapter 4, Building Data Pipelines in Snowflake, explains the capabilities on offer with Snowflake to process a string of SQL statements through the concept of tasks. Tasks allow developers to create data pipelines that process data and perform various functions as they progress through the execution sequence. Tasks combined with the concept of streams enables the user to manage complex orchestration and scheduling patterns. This chapter deals with setting up pipelines using tasks and streams and applying different techniques for transforming data within tasks.

Chapter 5, Data Protection and Security in Snowflake, walks you through handling authentication and authorization on Snowflake. Authentication refers to letting a user connect, and authorization refers to verifying what objects a connected user can access. Snowflake provides granular controls to limit access for out-of-the-box and custom roles. This chapter will help you set up role hierarchies, add custom roles, and set default roles for users.

Chapter 6, Performance and Cost Optimization, enables you to exploit Snowflake's capabilities to optimize queries and performance through various built-in features such as caching, autoscaling, and automatically clustering tables. There is always an opportunity to positively influence the performance by tweaking table structures, introducing physicalization techniques, and optimizing your compute resources to the maximum. In this chapter, we explore some of these techniques, which can be used to make a Snowflake-based data warehouse run more efficiently and therefore at a lower cost. The chapter also explores optimization strategies for reducing unnecessary storage costs.

Chapter 7, Secure Data Sharing, details how to share data with other Snowflake customers as well as non-Snowflake customers. Traditional warehouse solutions share data by extracting the data out and sending it over transport mechanisms, compromising data security and leading to inefficiencies. Another downside to this is that as soon as the data is extracted, it is already out of date. Snowflake overcomes this by providing a unique data sharing solution that ensures reduced costs, reduced operational overhead, and always up-to-date data.

Chapter 8, Back to the Future with Time Travel, equips you to deal with unpleasant data issues such as not determining when data was changed or whether the data has been lost altogether. Snowflake provides a unique way of going back in time through the Time Travel feature. This chapter explores the various applications of the Time Travel feature and combines it with cloning to tackle common data loss and debugging issues.

Chapter 9, Advanced SQL Techniques, provides you with multiple advanced SQL techniques using a Snowflake data warehouse. These SQL techniques are essential from a data warehousing perspective, such as trend analysis, temporal analytics, managing sequences, unique counts, and managing processes as transactions.

Chapter 10, Extending Snowflake's Capabilities, provides you with techniques for extending a Snowflake data warehouse and integrating it to coexist and work with other technologies. The chapter walks you through user-defined functions, through which custom functionality can be achieved. The chapter also involves connecting Snowflake with Apache Spark and demonstrates how to perform data processing on the Spark engine.