Book Image

Snowflake Cookbook

By : Hamid Mahmood Qureshi, Hammad Sharif
5 (1)
Book Image

Snowflake Cookbook

5 (1)
By: Hamid Mahmood Qureshi, Hammad Sharif

Overview of this book

Snowflake is a unique cloud-based data warehousing platform built from scratch to perform data management on the cloud. This book introduces you to Snowflake's unique architecture, which places it at the forefront of cloud data warehouses. You'll explore the compute model available with Snowflake, and find out how Snowflake allows extensive scaling through the virtual warehouses. You will then learn how to configure a virtual warehouse for optimizing cost and performance. Moving on, you'll get to grips with the data ecosystem and discover how Snowflake integrates with other technologies for staging and loading data. As you progress through the chapters, you will leverage Snowflake's capabilities to process a series of SQL statements using tasks to build data pipelines and find out how you can create modern data solutions and pipelines designed to provide high performance and scalability. You will also get to grips with creating role hierarchies, adding custom roles, and setting default roles for users before covering advanced topics such as data sharing, cloning, and performance optimization. By the end of this Snowflake book, you will be well-versed in Snowflake's architecture for building modern analytical solutions and understand best practices for solving commonly faced problems using practical recipes.
Table of Contents (12 chapters)

Managing tables

This recipe shows you how to create a table and insert data to explain different behaviors in storing data. Here you will be introduced to the different options that are available from a life cycle perspective, such as tables being permanent, temporary, volatile, and so on. Most of the concepts are not new, so the focus is going to be on the specifics related to Snowflake. We will start with a simple example that creates a table. We shall insert some sample data into it and then try out different variations on creating tables in Snowflake.

Getting ready

The following examples can be run either via the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's start by creating a table for storing customer data. We shall start with the DDL statement for creating a table:

  1. The following example DDL statement will be executed to create a table called CUSTOMERS in Snowflake:
    CREATE TABLE customers (
      id              INT NOT NULL,
      last_name       VARCHAR(100) ,
      first_name      VARCHAR(100),
      email           VARCHAR(100),
      company         VARCHAR(100),
      phone           VARCHAR(100),
      address1        VARCHAR(150),
      address2        VARCHAR(150),
      city            VARCHAR(100),
      state           VARCHAR(100),
      postal_code     VARCHAR(15),
      country         VARCHAR(50)
    );

    The command should successfully execute, generating the following message:

    Figure 2.11 – Table created successfully

    Figure 2.11 – Table created successfully

  2. To confirm that the table has been generated as per the specification, we can run a DESCRIBE TABLE statement a shown:
    DESCRIBE TABLE customers;

    It should generate the following results in the Snowflake web UI:

    Figure 2.12 –  Output of the DESCRIBE statement

    Figure 2.12 – Output of the DESCRIBE statement

  3. Let's assume there was something wrong with a data type; for example, say ADDRESS1 and ADDRESS2 were supposed to be stored as a STRING data type. This can be addressed using the REPLACE TABLE statement, along with CREATE. This will overwrite the existing CUSTOMERS table:
    CREATE TABLE customers (
      id              INT NOT NULL,
      last_name       VARCHAR(100) ,
      first_name      VARCHAR(100),
      email           VARCHAR(100),
      company         VARCHAR(100),
      phone           VARCHAR(100),
      address1        STRING,
      address2        STRING,
      city            VARCHAR(100),
      state           VARCHAR(100),
      postal_code     VARCHAR(15),
      country         VARCHAR(50)
    );
  4. Let's verify whether the desired change has been successfully applied. For that, we can execute the DESCRIBE TABLE statement again as shown:
    DESCRIBE TABLE customers;

    It should generate the following result in the Snowflake web UI:

    Figure 2.13 – Output of the DESCRIBE statement after the changes in the table

    Figure 2.13 – Output of the DESCRIBE statement after the changes in the table

    Please note the data types. We shall discuss this in the How it works… section along with how REPLACE works and how it is a shorthand for a two-step process typically required for managing the life cycle of a table.

  5. Let's now load this table with some data before we continue with the rest of the recipe. To do so, run the following command:
    COPY INTO customers
    FROM s3://snowflake-cookbook/Chapter02/r3/customer.csv
    FILE_FORMAT = (TYPE = csv SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"');

    This should complete with the following message:

    Figure 2.14 – Table loaded

    Figure 2.14 – Table loaded

  6. Let's now look at scenarios where the creation of a table is based on an existing table. A common pattern in the table life cycle involves deriving a table from an existing table. This is called CTAS or CREATE TABLE ... AS SELECT. To explain the use of CTAS, we will have some records inserted into the CUSTOMERS table. The dataset has 100 rows, as shown in the following screenshot (showing the first six columns of the table):
    Figure 2.15 – Sample data shown for the customer table

    Figure 2.15 – Sample data shown for the customer table

  7. The dataset will be copied into a new table that will have the same structure as the CUSTOMERS table. The statements to be executed are as follows:
    CREATE OR REPLACE TABLE 
    customers_deep_copy 
    AS 
    SELECT * 
    FROM customers;

    This should succeed with the following message:

    Figure 2.16 – Deep copy table created

    Figure 2.16 – Deep copy table created

    What we now have created is a deep copy of the CUSTOMERS table and a new table, which has received all data from CUSTOMERS and is called CUSTOMERS_DEEP_COPY. The deep copy means that the table structure and data has been copied, as opposed to a shallow copy, which would copy the table structure only. This copied table is now an independent copy that can have a life cycle of its own with no changes or side effects originating from any operations performed on the CUSTOMERS table. A SELECT QUERY statement on this new table would generate the same results as shown in the previous table.

  8. Now let's look at another table copying method – shallow copy. In certain situations, a new table is required to be created with the same structure as an existing table, but the data is not to be copied, in which case a shallow copy is created. This is achieved by using the LIKE keyword as provided by Snowflake:
    CREATE OR REPLACE TABLE 
    customers_shallow_copy 
    LIKE customers;

    This should succeed with the following message:

    Figure 2.17 – Shallow copy table created

    Figure 2.17 – Shallow copy table created

  9. This should create a table that has the same structure as the CUSTOMERS table but with no data in it:
    SELECT 
    COUNT(*)
    FROM 
    customers_shallow_copy;

    The result of the following count query is 0 rows, as shown:

    Figure 2.18 – New empty table created

    Figure 2.18 – New empty table created

  10. Until now, we have seen the creation of tables with permanent life – the table will exist and store data until explicitly dropped. This is the default behavior from a life cycle perspective. There are tables that handle transitory data and so they might be required only temporarily. For such scenarios, there are temporary and transient tables. Let's create these tables by running the following SQL statements:
    CREATE TEMPORARY TABLE customers_temp AS SELECT * FROM customers WHERE TRY_TO_NUMBER(postal_code) IS NOT NULL;
    CREATE TRANSIENT TABLE customers_trans AS AS SELECT * FROM customers WHERE TRY_TO_NUMBER(postal_code) IS NULL;
  11. The preceding SQL script will allow you to create two tables, customers_temp and customers_trans. The two tables are not permanent, but the tables have limitations. If you end the web UI session at this point, then the customers_temp table will not be recoverable after a re-login. Transient tables are available after a session has been closed and will retain data in a subsequent session created by user login; however, they don't consume fail-safe storage. This is an important mechanism for retaining data across sessions and can have applications in scenarios that require state management or in ETL jobs.

How it works…

REPLACE is actually a shorthand for a two-step process that's required when a table has to be deleted (dropped) and then recreated. That would typically be done by executing the two statements in sequence as follows:

DROP TABLE IF EXISTS
CREATE TABLE customers …

The deep and shallow copies can be explained by the following query and the result generated by Snowflake:

show tables like 'customers%';

This shows a table with three rows, each showing a summary of the three tables that we have generated previously, explaining the differences and similarities. The following table shows that the deep copies of the dataset are exactly the same while the shallow copy has been deprived of data, though the metadata is the same:

Figure 2.19 – The show tables command output showing the copies of the table

Figure 2.19 – The show tables command output showing the copies of the table

A thing to note in the preceding table is the kind column. The column is showing that the tables created have the kind attribute set as TABLE, which is the default type of table – a permanent table structure to be populated with data rows. (Please note that a select set of columns is being shown here for the sake of clarity.)

A local temporary table (also known as a volatile table) persists for the duration of the user session in which it was created and is not visible to other users. A temporary table's definition and contents are dropped at the end of the user session.

Transient tables are non-permanent tables, but unlike temporary tables, transient tables exist until explicitly dropped and are visible to any user with the appropriate privileges. Transient tables have a lower level of data protection than permanent tables. Data in a transient table may be lost in the event of a system failure. Transient tables should only be used for data that can be recreated in the event that the data is lost.

A show tables command for the customers_temp and customers_trans tables will show a table similar to the following table (please note that a limited set of columns is shown here for the sake of clarity):

Figure 2.20 – The show tables command output highlighting the temporary 
and transient nature of tables

Figure 2.20 – The show tables command output highlighting the temporary and transient nature of tables

Please note the content for the kind column. It shows that the tables are not permanent.

There's more…

One aspect of CREATE TABLE statements, CLONE, has been left for discussion in Chapter 8, Back to the Future with Time Travel. Temporary tables, however, cannot be cloned. Only permanent tables are fail-safe. Fail-safes will be discussed further in later chapters.