Book Image

Snowflake Cookbook

By : Hamid Mahmood Qureshi, Hammad Sharif
Book Image

Snowflake Cookbook

By: Hamid Mahmood Qureshi, Hammad Sharif

Overview of this book

Snowflake is a unique cloud-based data warehousing platform built from scratch to perform data management on the cloud. This book introduces you to Snowflake's unique architecture, which places it at the forefront of cloud data warehouses. You'll explore the compute model available with Snowflake, and find out how Snowflake allows extensive scaling through the virtual warehouses. You will then learn how to configure a virtual warehouse for optimizing cost and performance. Moving on, you'll get to grips with the data ecosystem and discover how Snowflake integrates with other technologies for staging and loading data. As you progress through the chapters, you will leverage Snowflake's capabilities to process a series of SQL statements using tasks to build data pipelines and find out how you can create modern data solutions and pipelines designed to provide high performance and scalability. You will also get to grips with creating role hierarchies, adding custom roles, and setting default roles for users before covering advanced topics such as data sharing, cloning, and performance optimization. By the end of this Snowflake book, you will be well-versed in Snowflake's architecture for building modern analytical solutions and understand best practices for solving commonly faced problems using practical recipes.
Table of Contents (12 chapters)

Managing a database

In this recipe, we will create a new database with default settings and walk through several variations on the database creation process. The recipe provides details such as how to minimize storage usage when creating databases and how to set up the replication of databases across regions and when to do so.

Getting ready

This recipe describes the various ways to create a new database in Snowflake. These steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's start with the creation of a database in Snowflake:

  1. The basic syntax for creating a new database is fairly straightforward. We will be creating a new database that is called our_first_database. We are assuming that the database doesn't exist already:
    CREATE DATABASE our_first_database
    COMMENT = 'Our first database';

    The command should successfully execute with the following message:

    Figure 2.1 – Database successfully created

    Figure 2.1 – Database successfully created

  2. Let's verify that the database has been created successfully and review the defaults that have been set up by Snowflake:
    SHOW DATABASES LIKE 'our_first_database';

    The query should return one row showing information about the newly created database, such as the database name, owner, comments, and retention time. Notice that retention_time is set to 1 and the options column is blank:

    Figure 2.2 – Information of the newly created database

    Figure 2.2 – Information of the newly created database

  3. Let's create another database for which we will set the time travel duration to be 15 days (in order to set the time travel duration above 1 day, you must have at least the Enterprise license for Snowflake):
    CREATE DATABASE production_database 
    DATA_RETENTION_TIME_IN_DAYS = 15
    COMMENT = 'Critical production database';
    SHOW DATABASES LIKE 'production_database';

    The output of SHOW DATABASES should now show retention_time as 15, indicating that the time travel duration for the database is 15 days:

    Figure 2.3 – SHOW DATABASES output

    Figure 2.3 – SHOW DATABASES output

  4. While time travel is normally required for production databases, you wouldn't normally need time travel and the fail-safe for temporary databases such as databases that are used in ETL processing. Removing time travel and the fail-safe helps in reducing storage costs. Let's see how that is done:
    CREATE TRANSIENT DATABASE temporary_database 
    DATA_RETENTION_TIME_IN_DAYS = 0
    COMMENT = 'Temporary database for ETL processing';
    SHOW DATABASES LIKE 'temporary_database';

    The output of SHOW DATABASES would show retention_time as zero, indicating that there is no time travel storage for this database, and also the options column would show TRANSIENT as the option, which essentially means that there will be no fail-safe storage for this database.

  5. The time travel configuration can also be changed at a later time by altering the database with ALTER:
    ALTER DATABASE temporary_database
    SET DATA_RETENTION_TIME_IN_DAYS = 1;
    SHOW DATABASES LIKE 'temporary_database';

How it works…

The basic CREATE DATABASE command creates a database with the defaults set at the account level. If you have not changed the defaults, the default for time travel is 1 day, which is the value that appears in retention_time when you run the SHOW DATABASES command. The database will also have a fail-safe enabled automatically. Both these options will cost you in storage, and in certain cases, you might want to reduce those storage costs. As an example, databases that are used for temporary ETL processing can easily be configured to avoid these costs.

A key thing to know about databases and tables used for ETL processing is that the data in those tables will be repeatedly inserted and deleted. If such tables are not specifically configured, you will be unnecessarily incurring costs for the time travel and fail-safe that is stored with every data change that happens for those tables. We will set such databases to be transient (with TRANSIENT) so that the fail-safe option is not the default for the tables in that database. Setting this option does mean that such databases are not protected by fail-safe if a data loss event occurs, but for temporary databases and tables, this should not be an issue. Also, we have set time travel to be zero so that there is no time travel storage as well.

Do note that although we have set the database to have no time travel and no fail-safe, we can still set individual tables within the database to be protected by the fail-safe and time travel. Setting these options at the database level only changes the defaults for the objects created within that database.

Note that there is the ALTER DATABASE command as well, which can be used to change some of the properties after the database has been created. It is a powerful command that allows renaming the database, swapping a database with another database, and also resetting custom properties back to their defaults.

It is important to note that creating a database sets the current database of the session to the newly created database. That would mean that any subsequent data definition language (DDL) commands such as CREATE TABLE would create a table under that new database. This is like using the USE DATABASE command.

There's more…

We will cover time travel and fail-safes in much more detail in subsequent chapters. We will also cover in depth how to create databases from shares and databases that clone other databases.