Book Image

Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel
Book Image

Amazon Redshift Cookbook

By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.
Table of Contents (13 chapters)

Connecting to an Amazon Redshift cluster using Python

Python is widely used for data analytics due to its simplicity and ease of use. In this recipe, we will use Python programming to connect using the Amazon Redshift Data API.

The Data API allows you to access Amazon Redshift without the need to use the JDBC or ODBC drivers. You can execute SQL commands on an Amazon Redshift cluster by invoking a secure API endpoint provided by the Data API. The Data API ensures that your SQL queries will be submitted asynchronously. You can now monitor the status of the query and retrieve your results later. The Data API is supported on all major programming languages, including Python, Go, Java, Node.js, PHP, Ruby, and C++, along with the AWS SDK.

Getting ready

To complete this recipe, you will need to do the following:

  • Create an IAM user with access to Amazon Redshift, Amazon Secrets Manager, and Amazon EC2.
  • Store the database credentials in Amazon Secrets Manager using Recipe...