Book Image

Amazon Redshift Cookbook

By : Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel
Book Image

Amazon Redshift Cookbook

By: Shruti Worlikar, Thiyagarajan Arumugam, Harshida Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.
Table of Contents (13 chapters)

Chapter 3: Loading and Unloading Data

In this chapter, we will delve into the data loading process, which allows us to put transformed data from source systems into a target data warehouse table structure. While data can be loaded into Amazon Redshift using an INSERT statement (as in the case of other relational databases), it is more efficient to bulk load the data, given the volumes that a data warehouse handles. For example, in an ordering system-based data warehouse table, usually, the entire previous day's worth of data needs to be loaded rather than individual orders. Similarly, data from the data warehouse can be exported to other applications in bulk using the unload feature.

There are multiple ways of loading data into an Amazon Redshift cluster. The most common way is using the COPY command to load data from Amazon S3. This chapter will cover all the different ways you can load data into a Redshift cluster from different sources.

The following recipes will be...