Book Image

Getting Started with Amazon Redshift

By : Stefan Bauer
Book Image

Getting Started with Amazon Redshift

By: Stefan Bauer

Overview of this book

<p>Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service. It provides an excellent approach to analyzing all your data using your existing business intelligence tools.</p> <p>Getting Started with Amazon Redshift is an easy-to-read, descriptive guide that breaks down the complex topics of data warehousing and Amazon Redshift. You will learn the fundamentals of Redshift technology and how to implement your own Redshift cluster, through practical, real-world examples. This exciting new technology is a powerful tool in your arsenal of data management and this book is a must-have to implement and manage your next enterprise Data Warehouse.</p> <p>Packed with detailed descriptions, diagrams, and explanations, Getting Started with Amazon Redshift will bring you along, regardless of your current level of understanding, to a point where you will feel comfortable with running your own Redshift cluster. The author's own experiences will give you an understanding of what you will need to consider when working with your own data. You will also learn about how compression has been implemented and what that means relative to a column store database structure. As you progress, you will gain an understanding of monitoring techniques, performance considerations, and what it will take to successfully run your Amazon Redshift cluster on a day-to-day basis. There truly is something in this book for everyone who is interested in learning about this technology.</p>
Table of Contents (14 chapters)

Cluster configuration


Single node clusters should only be deployed for testing and development work. There can be no recovery from a node failure (other than from a snapshot restore) if you have a single node. Multiple nodes will not only provide for parallel query and load operations, but will also allow for data protection, as the blocks are replicated between nodes. Additionally, as we have seen, there are some functions that will run only on the leader node. A single node cluster (the leader node) and compute nodes are one and the same, so not only do you fail to gain the benefits of parallel processing, but you actually face the penalty of not having a separate node to handle the tasks of the leader node.

When provisioning your cluster, be sure to pick a maintenance window that will work best for your load times and availability needs. It is also best, particularly early on in Redshift's life cycle, to allow automatic patching for that maintenance window. Notices are published weekly...