Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Amazon Redshift Cookbook

By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Chapter 1: Getting Started with Amazon Redshift

Technical requirements

Creating an Amazon Redshift cluster using the AWS Console

Creating an Amazon Redshift cluster using the AWS CLI

Creating an Amazon Redshift cluster using an AWS CloudFormation template

Connecting to an Amazon Redshift cluster using the Query Editor

Connecting to an Amazon Redshift cluster using the SQL Workbench/J client

Connecting to an Amazon Redshift Cluster using a Jupyter Notebook

Connecting to an Amazon Redshift cluster using Python

Connecting to an Amazon Redshift cluster programmatically using Java

Connecting to an Amazon Redshift cluster programmatically using .NET

Connecting to an Amazon Redshift cluster using the command line

Free Chapter

Chapter 2: Data Management

Technical requirements

Managing a database in an Amazon Redshift cluster

Managing a schema in a database

Managing tables

Managing views

Managing materialized views

Managing stored procedures

Managing UDFs

Chapter 3: Loading and Unloading Data

Technical requirements

Loading data from Amazon S3 using COPY

Loading data from Amazon EMR

Loading data from Amazon DynamoDB

Loading data from remote hosts

Updating and inserting data

Unloading data to Amazon S3

Chapter 4: Data Pipelines

Technical requirements

Ingesting data from transactional sources using AWS DMS

Streaming data to Amazon Redshift via Amazon Kinesis Firehose

Cataloging and ingesting data using AWS Glue

Chapter 5: Scalable Data Orchestration for Automation

Technical requirements

Scheduling queries using the Amazon Redshift query editor

Event-driven applications using Amazon EventBridge and the Amazon Redshift Data API

Event-driven applications using AWS Lambda

Orchestrating using AWS Step Functions

Orchestrating using Amazon MWAA

Chapter 6: Data Authorization and Security

Technical requirements

Managing infrastructure security

Data encryption at rest

Data encryption in transit

Column-level security

Loading and unloading encrypted data

Managing superusers

Managing users and groups

Managing federated authentication

Using IAM authentication to generate database user credentials

Managing audit logs

Monitoring Amazon Redshift

Chapter 7: Performance Optimization

Technical requirements

Amazon Redshift Advisor

Managing column compression

Managing data distribution

Managing sort keys

Analyzing and improving queries

Configuring workload management (WLM)

Utilizing Concurrency Scaling

Optimizing Spectrum queries

Chapter 8: Cost Optimization

Technical requirements

AWS Trusted Advisor

Amazon Redshift Reserved Instance pricing

Configuring pause and resume for an Amazon Redshift cluster

Scheduling pause and resume

Configuring Elastic Resize for an Amazon Redshift cluster

Scheduling Elastic Resizing

Using cost controls to set actions for Redshift Spectrum

Using cost controls to set actions for Concurrency Scaling

Chapter 9: Lake House Architecture

Technical requirements

Building a data lake catalog using AWS Lake Formation

Exporting a data lake from Amazon Redshift

Extending a data warehouse using Amazon Redshift Spectrum

Data sharing across multiple Amazon Redshift clusters

Querying operational sources using Federated Query

Chapter 10: Extending Redshift's Capabilities

Technical requirements

Managing Amazon Redshift ML

Visualizing data using Amazon QuickSight

AppFlow for ingesting SaaS data in Redshift

Data wrangling using DataBrew

Utilizing ElastiCache for sub-second latency

Subscribing to third-party data using AWS Data Exchange

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Appendix

Recipe 1 – Creating an IAM user

Recipe 2 – Storing database credentials using Amazon Secrets Manager

Recipe 3 – Creating an IAM role for an AWS service

Recipe 4 – Attaching an IAM role to the Amazon Redshift cluster

Why subscribe?

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Connecting to an Amazon Redshift cluster using Python

Python is widely used for data analytics due to its simplicity and ease of use. In this recipe, we will use Python programming to connect using the Amazon Redshift Data API.

The Data API allows you to access Amazon Redshift without the need to use the JDBC or ODBC drivers. You can execute SQL commands on an Amazon Redshift cluster by invoking a secure API endpoint provided by the Data API. The Data API ensures that your SQL queries will be submitted asynchronously. You can now monitor the status of the query and retrieve your results later. The Data API is supported on all major programming languages, including Python, Go, Java, Node.js, PHP, Ruby, and C++, along with the AWS SDK.

Getting ready

To complete this recipe, you will need to do the following:

Create an IAM user with access to Amazon Redshift, Amazon Secrets Manager, and Amazon EC2.
Store the database credentials in Amazon Secrets Manager using Recipe...

Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Amazon Redshift Cookbook

By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Related Content you might be interested in

Current Title:

Amazon Redshift Cookbook

Serverless Machine Learning with Amazon Redshift ML

Modern Data Architecture on AWS

Data Engineering with AWS

Connecting to an Amazon Redshift cluster using Python

Getting ready