-
Book Overview & Buying
-
Table Of Contents
Amazon Redshift Cookbook
By :
Python is widely used for data analytics due to its simplicity and ease of use. We will use Python to connect using the Amazon Redshift Data API.
The Data API allows you to access Amazon Redshift without using the JDBC or ODBC drivers. You can execute SQL commands on an Amazon Redshift data warehouse (serverless or provisioned cluster), invoking a secure API endpoint provided by the Data API. The Data API ensures the SQL queries to be submitted asynchronously. You can now monitor the status of the query and retrieve your results at a later time. The Data API is supported by the major programming languages, such as Python, Go, Java, Node.js, PHP, Ruby, and C++, along with the AWS SDK.
To complete this recipe, you will need:
Follow these steps to use a Linux terminal to connect to Amazon Redshift using Python:
pip install boto3
python on the Linux terminal and start typing the following code. We will first import the boto3 package and establish a session:
import boto3
import json
redshift_cluster_id = "myredshiftcluster"
redshift_database = "dev"
aws_region_name = "eu-west-1"
secret_arn="arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
def get_client(service, aws_region_name):
import botocore.session as bc
session = bc.get_session()
s = boto3.Session(botocore_session=session, region_name=region)
return s.client(service)
boto3.Session object using RedshiftData:
rsd = get_client('redshift-data')
ExecuteStatementOutput, which includes the statement ID:
resp = rsd.execute_statement(
SecretArn= secret_arn
ClusterIdentifier=redshift_cluster_id,
Database= redshift_database,
Sql="SELECT sysdate;"
)
queryId = resp['Id']
print(f"asynchronous query execution: query id {queryId}")
describe_statement and the number of records retrieved:
stmt = rsd.describe_statement(Id=queryId)
desc = None
while True:
desc = rsd.describe_statement(Id=queryId)
if desc["Status"] == "FINISHED":
break
print(desc["ResultRows"])
get_statement_result. get_statement_result returns a JSON-based metadata and result that can be verified using the below statement:
if desc and desc["ResultRows"] > 0:
result = rsd.get_statement_result(Id=queryId)
print("results JSON" + "\n")
print(json.dumps(result, indent = 3))
Note
The query results are available for retrieval only for 24 hours.
The complete script for the above Python code is also available at https://github.com/PacktPublishing/Amazon-Redshift-Cookbook-2E/blob/main/Chapter01/Python_Connect_to_AmazonRedshift.py. It can be executed as python Python_Connect_to_AmazonRedshift.py.