Book Image

Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel
Book Image

Amazon Redshift Cookbook

By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.
Table of Contents (13 chapters)

Managing stored procedures

Stored procedures in Amazon Redshift are user-created objects using a Procedural Language/PostgreSQL (PL/pgSQL) procedural programming language. Stored procedures support both data definition language (DDL) and data manipulation language (DML). Stored procedures can take in input arguments but do not necessarily need to return results. PL/pgSQL also supports conditional logic, loops, and case statements. Stored procedures are commonly used to build reusable extract, transform, load (ETL) data pipelines and enable the database administrator (DBA) to automate routine administrative activities—for example, periodically dropping unused tables.

The SECURITY attribute controls who has privileges to access certain database objects.

Stored procedures can be created with security definer controls to allow execution of a procedure without giving access to underlying tables—for example, they can drop a table created by another user and enable the DBA to automate administrative activities.

Getting ready

To complete this recipe, you will need the following:

  • Access to the Amazon Web Services (AWS) Management Console
  • Access to any SQL interface such as a SQL client or query editor

How to do it…

In this recipe, we will start with creating a scalar Python-based UDF that will be used to parse an Extensible Markup Language (XML) input:

  1. Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create a sp_cookbook stored procedure:
    Create schema cookbook;
    create or replace procedure sp_cookbook(indate in date, records_out INOUT refcursor) as
    $$
    declare
      integer_var int;
    begin
       RAISE INFO 'running first cookbook storedprocedure on date %',  indate;
       drop table if exists cookbook.cookbook_tbl;
      create table cookbook.cookbook_tbl
      (recipe_name varchar(50),
       recipe_date date
       );
       insert into cookbook.cookbook_tbl values('stored procedure', indate);
      GET DIAGNOSTICS integer_var := ROW_COUNT;
       RAISE INFO 'rows inserted into cookbook_tbl = %', integer_var;
        OPEN records_out FOR SELECT * FROM cookbook.cookbook_tbl;
    END;   
    $$ LANGUAGE plpgsql;

    This stored procedure is taking two parameters: indate is the input, and records_out serves as both an input and output parameter. This stored procedure uses DDL and DML statements. The current user is the owner of the stored procedure and is also the owner of the cookbook.cookbook_tbl table.

    Note

    Some older versions of SQL client tools may produce an "unterminated dollar-quoted string at or near "$$"error. Ensure that you have the latest version of the SQL client—for example, ensure you are using version 124 or higher for the SQL Workbench/J client.

  2. Now, let's execute the sp_cookbook stored procedure using the following statements:
    call sp_cookbook(current_date, 'inputcursor');
    fetch all from inputcursor;

    This is the expected output:

    Message
    running first cookbook storedprocedure on date 2020-12-13
    rows inserted into cookbook_tbl = 1
    recipe_name recipe_date
    stored procedure   2020-12-13 00:00:00
  3. To view a definition of the previously created stored procedure, you can run the following statement:
    SHOW PROCEDURE sp_cookbook(indate in date, records_out INOUT refcursor);
  4. We will now create another stored procedure with a security definer privilege:
    create or replace procedure public.sp_self_service(tblName in varchar(60)) as
                $$ 
    begin        
        RAISE INFO 'running sp_self_service to drop table %',  tblName;
        execute 'drop table if exists cookbook.' || tblName;
      RAISE INFO 'table dropped %',  tblName;
    END;   
    $$ LANGUAGE plpgsql
    SECURITY DEFINER;
  5. Let's create a user and check whether they have a permission to drop the cookbook.cookbook_tbl table. The user1 user does not have a permission to drop the table:
    create user user1 with password 'Cookbook1';
    grant execute on procedure public.sp_self_service(tblName in varchar(60)) to user1;
    set SESSION authorization  user1;
    select current_user;
    drop table cookbook.cookbook_tbl;

    This is the expected output:

    ERROR: 42501: permission denied for schema cookbook
  6. When user1 executes the sp_self_service stored procedure, the procedure runs with the security context of the owner of the procedure:
    set SESSION authorization  user1;
    select current_user;
    call public.sp_self_service('cookbook_tbl');

    This is the expected output:

    running sp_self_service to drop table cookbook_tbl
    table 

    This allows the user to drop the table without providing the full permissions for the tables in the cookbook schema.

How it works…

Amazon Redshift uses the PL/pgSQL procedural language for authoring the stored procedures. PL/pgSQL provides programmatic access that can be used to author control structures to the SQL language and allow complex computations. For example, you have a stored procedure that can create users and set up necessary access that meets your organizational needs—hence, rather than invoking several commands, this can now be done in a single step. You can find the complete reference to the PL/pgSQL procedural language at https://www.postgresql.org/docs/8.0/plpgsql.html and ready-to-use stored useful procedures at https://github.com/awslabs/amazon-redshift-utils/tree/master/src/StoredProcedures. The SECURITY access attribute of a stored procedure defines the privileges to access underlying database objects used. By default, an INVOKER is used to access the user privileges and the SECURITY DEFINER allows the procedure user to inherit the privileges of the owner.