Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Amazon Redshift Cookbook
  • Table Of Contents Toc
Amazon Redshift Cookbook

Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel
4.8 (9)
close
close
Amazon Redshift Cookbook

Amazon Redshift Cookbook

4.8 (9)
By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.
Table of Contents (13 chapters)
close
close

Connecting to Amazon Redshift using Jupyter Notebook

The Jupyter Notebook is an interactive web application that enables you to analyze your data interactively. Jupyter Notebook is widely used by users such as business analysts and data scientists to perform data wrangling and exploration. Using Jupyter Notebook, you can access all the historical data available in an Amazon Redshift data warehouse (serverless or provisioned cluster) and combine that with data in many other sources, such as an Amazon S3 data lake. For example, you might want to build a forecasting model based on historical sales data in Amazon Redshift combined with clickstream data available in the data lake. Jupyter Notebook is the tool of choice due to the versatility it provides with exploration tasks and the strong support from the open source community. This recipe covers the steps to connect to an Amazon Redshift data warehouse using Jupyter Notebook.

Getting ready

To complete this recipe, you will need:

How to do it…

The following steps will help you connect to an Amazon Redshift cluster using an Amazon SageMaker notebook:

  1. Open the AWS Console and navigate to the Amazon SageMaker service.
  2. Navigate to your notebook instance and open JupyterLab. When using the Amazon SageMaker notebook, find the notebook instance that was launched and click on the Open JupyterLab link, as shown in the following screenshot:
Figure 1.12 – Navigating to JupyterLab using the AWS Console

Figure 1.12 – Navigating to JupyterLab using the AWS Console

  1. Now, let’s install the Python driver libraries to connect to Amazon Redshift using the following code in the Jupyter Notebook. Set the kernel as conda_python3:
    !pip install psycopg2-binary
    ### boto3 is optional, but recommended to leverage the AWS Secrets Manager storing the credentials  Establishing a Redshift Connection
    !pip install boto3
    

    Important Note

    You can connect to an Amazon Redshift cluster using Python libraries such as Psycopg (https://pypi.org/project/psycopg2-binary/) or pg (https://www.postgresql.org/docs/7.3/pygresql.html) to connect to the Notebook. Alternatively, you can also use a JDBC, but for ease of scripting with Python, the following recipes will use either of the preceding libraries.

  1. Grant the Amazon SageMaker instance permission to use the stored secret. On the AWS Secrets Manager console, click on your secret and find the Secret ARN. Replace the ARN information in the resource section with the following JSON code:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "secretsmanager:GetResourcePolicy",
            "secretsmanager:GetSecretValue",
            "secretsmanager:DescribeSecret",
            "secretsmanager:ListSecretVersionIds"
          ],
          "Resource": [
            "arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
          ]
        }
      ]
    }
    
  2. Now, attach this policy as an inline policy to the execution role for your SageMaker notebook instance. To do this, follow these steps:
    1. Navigate to the Amazon SageMaker (https://us-west-2.console.aws.amazon.com/sagemaker/) console.
    2. Select Notebook Instances.
    3. Click on your notebook instance (the one running this notebook, most likely).
    4. Under Permissions and Encryption, click on the IAM role link.
    5. You should now be on an IAM console that allows you to Add inline policy. Click on the link.
    6. On the Create Policy page that opens, click JSON and replace the JSON lines that appear with the preceding code block.
    7. Click Review Policy.
    8. On the next page select a human-friendly name for the policy and click Create policy.
  3. Finally, paste the ARN for your secret in the following code block of the Jupyter Notebook to connect to the Amazon Redshift cluster:
    # Put the ARN of your AWS Secrets Manager secret for your redshift cluster here:
    secret_arn="arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
    # This will get the secret from AWS Secrets Manager.
    import boto3
    import json
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager'
    )
    get_secret_value_response = client.get_secret_value(
        SecretId=secret_arn
    )
    if 'SecretString' in get_secret_value_response:
        connection_info = json.loads(get_secret_value_response['SecretString'])
    else:
        print("ERROR: no secret data found")
    # Sanity check for credentials
    expected_keys = set(['user', 'password', 'host', 'database', 'port'])
    if not expected_keys.issubset(connection_info.keys()):
        print("Expected values for ",expected_keys)
        print("Received values for ",set(connection_info.keys()))
        print("Please adjust query or assignment as required!")
    # jdbc:redshift://HOST:PORT/DBNAME
    import time
    import psycopg2
    database = "dev"
    con=psycopg2.connect(
        dbname   = database,
        host     = connection_info["host"],
        port     = connection_info["port"],
        user     = connection_info["username"],
        password = connection_info["password"]
    )
    
  4. Run basic queries against the database. These queries make use of the cursor class to execute a basic query in Amazon Redshift:
    cur = con.cursor()
    cur.execute("SELECT sysdate")
    res = cur.fetchall()
    print(res)
    cur.close()
    
  5. Optionally, you can use the code here to connect to Amazon Redshift using Amazon SageMaker notebook: https://github.com/PacktPublishing/Amazon-Redshift-Cookbook-2E/blob/main/Chapter01/Connecting_to_AmazonRedshift_using_JupyterNotebook.ipynb.
Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Amazon Redshift Cookbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon