-
Book Overview & Buying
-
Table Of Contents
Amazon Redshift Cookbook
By :
The Jupyter Notebook is an interactive web application that enables you to analyze your data interactively. Jupyter Notebook is widely used by users such as business analysts and data scientists to perform data wrangling and exploration. Using Jupyter Notebook, you can access all the historical data available in an Amazon Redshift data warehouse (serverless or provisioned cluster) and combine that with data in many other sources, such as an Amazon S3 data lake. For example, you might want to build a forecasting model based on historical sales data in Amazon Redshift combined with clickstream data available in the data lake. Jupyter Notebook is the tool of choice due to the versatility it provides with exploration tasks and the strong support from the open source community. This recipe covers the steps to connect to an Amazon Redshift data warehouse using Jupyter Notebook.
To complete this recipe, you will need:
The following steps will help you connect to an Amazon Redshift cluster using an Amazon SageMaker notebook:

Figure 1.12 – Navigating to JupyterLab using the AWS Console
conda_python3:
!pip install psycopg2-binary
### boto3 is optional, but recommended to leverage the AWS Secrets Manager storing the credentials Establishing a Redshift Connection
!pip install boto3
Important Note
You can connect to an Amazon Redshift cluster using Python libraries such as Psycopg (https://pypi.org/project/psycopg2-binary/) or pg (https://www.postgresql.org/docs/7.3/pygresql.html) to connect to the Notebook. Alternatively, you can also use a JDBC, but for ease of scripting with Python, the following recipes will use either of the preceding libraries.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": [
"arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
]
}
]
}
# Put the ARN of your AWS Secrets Manager secret for your redshift cluster here:
secret_arn="arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
# This will get the secret from AWS Secrets Manager.
import boto3
import json
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager'
)
get_secret_value_response = client.get_secret_value(
SecretId=secret_arn
)
if 'SecretString' in get_secret_value_response:
connection_info = json.loads(get_secret_value_response['SecretString'])
else:
print("ERROR: no secret data found")
# Sanity check for credentials
expected_keys = set(['user', 'password', 'host', 'database', 'port'])
if not expected_keys.issubset(connection_info.keys()):
print("Expected values for ",expected_keys)
print("Received values for ",set(connection_info.keys()))
print("Please adjust query or assignment as required!")
# jdbc:redshift://HOST:PORT/DBNAME
import time
import psycopg2
database = "dev"
con=psycopg2.connect(
dbname = database,
host = connection_info["host"],
port = connection_info["port"],
user = connection_info["username"],
password = connection_info["password"]
)
cursor class to execute a basic query in Amazon Redshift:
cur = con.cursor()
cur.execute("SELECT sysdate")
res = cur.fetchall()
print(res)
cur.close()