Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Amazon Redshift Cookbook
  • Table Of Contents Toc
Amazon Redshift Cookbook

Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel
4.8 (9)
close
close
Amazon Redshift Cookbook

Amazon Redshift Cookbook

4.8 (9)
By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.
Table of Contents (13 chapters)
close
close

Connecting to Amazon Redshift programmatically using Python and the Redshift Data API

Python is widely used for data analytics due to its simplicity and ease of use. We will use Python to connect using the Amazon Redshift Data API.

The Data API allows you to access Amazon Redshift without using the JDBC or ODBC drivers. You can execute SQL commands on an Amazon Redshift data warehouse (serverless or provisioned cluster), invoking a secure API endpoint provided by the Data API. The Data API ensures the SQL queries to be submitted asynchronously. You can now monitor the status of the query and retrieve your results at a later time. The Data API is supported by the major programming languages, such as Python, Go, Java, Node.js, PHP, Ruby, and C++, along with the AWS SDK.

Getting ready

To complete this recipe, you will need:

  • An IAM user with access to Amazon Redshift, Amazon Secrets Manager, and Amazon EC2.
  • Store the database credentials in Amazon Secrets Manager using Recipe 2 in Appendix.
  • Linux machine terminal such as Amazon EC2, deployed in the same VPC as the Amazon Redshift cluster.
  • Python 3.6 or higher version installed on the Linux instance where you can write and execute the code. If you have not installed Python, you can download it from https://www.python.org/downloads/.
  • Install AWS SDK for Python (Boto3) on the Linux instance. You can see the getting started guide at https://aws.amazon.com/sdk-for-python/.
  • Modify the security group attached to the Amazon Redshift cluster to allow connections from the Amazon EC2 Linux instance, which will allow it to execute the Python code.
  • Create a VPC endpoint for Amazon Secrets Manager and allow the security group to allow the Linux instance to access the Secrets Manager VPC endpoint.

How to do it…

Follow these steps to use a Linux terminal to connect to Amazon Redshift using Python:

  1. Open the Linux terminal and install the latest AWS SDK for Python (Boto3) using the following command:
    pip install boto3
    
  2. Next, we will write the Python code. Type python on the Linux terminal and start typing the following code. We will first import the boto3 package and establish a session:
    import boto3
    import json
    redshift_cluster_id = "myredshiftcluster"
    redshift_database = "dev"
    aws_region_name = "eu-west-1"
    secret_arn="arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
    def get_client(service, aws_region_name):
        import botocore.session as bc
        session = bc.get_session()
        s = boto3.Session(botocore_session=session, region_name=region)
        return s.client(service)
    
  3. You can now create a client object from the boto3.Session object using RedshiftData:
    rsd = get_client('redshift-data')
    
  4. We will execute a SQL statement to get the current date by using the secrets ARN to retrieve credentials. You can execute DDL or DML statements. The query execution is asynchronous in nature. When the statement is executed, it returns ExecuteStatementOutput, which includes the statement ID:
    resp = rsd.execute_statement(
        SecretArn= secret_arn
        ClusterIdentifier=redshift_cluster_id,
        Database= redshift_database,
        Sql="SELECT sysdate;"
    )
    queryId = resp['Id']
    print(f"asynchronous query execution: query id {queryId}")
    
  5. Check the status of the query using describe_statement and the number of records retrieved:
    stmt = rsd.describe_statement(Id=queryId)
    desc = None
    while True:
            desc = rsd.describe_statement(Id=queryId)       
            if desc["Status"] == "FINISHED":           
                break
                print(desc["ResultRows"])
    
  6. You can now retrieve the results of the above query using get_statement_result. get_statement_result returns a JSON-based metadata and result that can be verified using the below statement:
    if desc and desc["ResultRows"]  > 0:
       result = rsd.get_statement_result(Id=queryId)
       print("results JSON" + "\n")
       print(json.dumps(result, indent = 3))    
    

    Note

    The query results are available for retrieval only for 24 hours.

The complete script for the above Python code is also available at https://github.com/PacktPublishing/Amazon-Redshift-Cookbook-2E/blob/main/Chapter01/Python_Connect_to_AmazonRedshift.py. It can be executed as python Python_Connect_to_AmazonRedshift.py.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Amazon Redshift Cookbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon