Book Image

Mastering SAS Programming for Data Warehousing

By : Monika Wahi
Book Image

Mastering SAS Programming for Data Warehousing

By: Monika Wahi

Overview of this book

SAS is used for various functions in the development and maintenance of data warehouses, thanks to its reputation of being able to handle ’big data’. This book will help you learn the pros and cons of storing data in SAS. As you progress, you’ll understand how to document and design extract-transform-load (ETL) protocols for SAS processes. Later, you’ll focus on how the use of SAS arrays and macros can help standardize ETL. The book will also help you examine approaches for serving up data using SAS and explore how connecting SAS to other systems can enhance the data warehouse user’s experience. By the end of this data management book, you will have a fundamental understanding of the roles SAS can play in a warehouse environment, and be able to choose wisely when designing your data warehousing processes involving SAS.
Table of Contents (18 chapters)
1
Section 1: Managing Data in a SAS Data Warehouse
7
Section 2: Using SAS for Extract-Transform-Load (ETL) Protocols in a Data Warehouse
12
Section 3: Using SAS When Serving Warehouse Data to Users

Questions

  1. Why is it helpful to consider both analyst and developer users in a data warehouse or data lake?

  2. Why do analyst users of data lakes and developer users of data warehouses need extensive documentation on source datasets?

  3. What are the pros and cons of having analysts access a data lake server directly?

  4. How does having multiple foreign keys in a data warehouse make it more useful to analysts?

  5. Imagine you are hosting an annual dataset in your data warehouse. One year when you receive the dataset, you learn that a new additional categorical variable is included that you find valuable, named ADJPRICE. For the next 2 years, you receive the dataset with ADJPRICE in it coded according to the same system, but the third year, you receive the dataset without ADJPRICE but with ADJPRICE2, which is coded slightly differently than ADJPRICE. If you were to make a crosswalk variable to handle ADJPRICE and ADJPRICE2 in datasets over all these years, what coding would it...