Getting Started with Amazon Redshift

Getting Started with Amazon Redshift

By : Stefan Bauer

Buy this Book

Getting Started with Amazon Redshift

By: Stefan Bauer

Buy this Book

Overview of this book

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service. It provides an excellent approach to analyzing all your data using your existing business intelligence tools. Getting Started with Amazon Redshift is an easy-to-read, descriptive guide that breaks down the complex topics of data warehousing and Amazon Redshift. You will learn the fundamentals of Redshift technology and how to implement your own Redshift cluster, through practical, real-world examples. This exciting new technology is a powerful tool in your arsenal of data management and this book is a must-have to implement and manage your next enterprise Data Warehouse. Packed with detailed descriptions, diagrams, and explanations, Getting Started with Amazon Redshift will bring you along, regardless of your current level of understanding, to a point where you will feel comfortable with running your own Redshift cluster. The author's own experiences will give you an understanding of what you will need to consider when working with your own data. You will also learn about how compression has been implemented and what that means relative to a column store database structure. As you progress, you will gain an understanding of monitoring techniques, performance considerations, and what it will take to successfully run your Amazon Redshift cluster on a day-to-day basis. There truly is something in this book for everyone who is interested in learning about this technology.

Getting Started with Amazon Redshift

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Overview

Pricing

Configuration options

Data storage

Considerations for your environment

Summary

Transition to Redshift

Cluster configurations

Cluster creation

Cluster details

SQL Workbench and other query tools

Unsupported features

Command line

The PSQL command line

Summary

Loading Your Data to Redshift

Schemas

Performance monitoring

Indexing strategies

Sort keys

Distribution keys

Summary

Managing Your Data

Backup and recovery

Resize

Table maintenance

Workload Management (WLM)

Compression

Streaming data

Query optimizer

Summary

Querying Data

SQL syntax considerations

Query performance monitoring

Explain plans

Working with tables

Summary

Best Practices

Security

Cluster configuration

Summary

Reference Materials

Third-party tools and software

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Cluster configuration

Single node clusters should only be deployed for testing and development work. There can be no recovery from a node failure (other than from a snapshot restore) if you have a single node. Multiple nodes will not only provide for parallel query and load operations, but will also allow for data protection, as the blocks are replicated between nodes. Additionally, as we have seen, there are some functions that will run only on the leader node. A single node cluster (the leader node) and compute nodes are one and the same, so not only do you fail to gain the benefits of parallel processing, but you actually face the penalty of not having a separate node to handle the tasks of the leader node.

When provisioning your cluster, be sure to pick a maintenance window that will work best for your load times and availability needs. It is also best, particularly early on in Redshift's life cycle, to allow automatic patching for that maintenance window. Notices are published weekly...

Getting Started with Amazon Redshift

By : Stefan Bauer

Getting Started with Amazon Redshift

By: Stefan Bauer

Overview of this book

Related Content you might be interested in

Current Title:

Getting Started with Amazon Redshift

Cluster configuration