Transforming Data to Optimize for Analytics

Book Overview & Buying
Table Of Contents

Data Engineering with AWS - Second Edition

By : Gareth Eagar

4.8 (31)

Buy this Book

Data Engineering with AWS

4.8 (31)

By: Gareth Eagar

Buy this Book

Overview of this book

This book, authored by a Senior Data Architect with 25 years of experience, helps you gain expertise in the AWS ecosystem for data engineering. This revised edition updates every chapter to cover the latest AWS services and features, provides a refreshed view on data governance, and introduces a new section on building modern data platforms. You will learn how to implement a data mesh, work with open-table formats such as Apache Iceberg, and apply DataOps practices for automation and observability. You will begin by exploring core concepts and essential AWS tools used by data engineers, along with modern data management approaches. You will then design and build data pipelines, review raw data sources, transform data, and understand how it is consumed by various stakeholders. The book also covers data governance, populating data marts and warehouses, and how a data lakehouse fits into the architecture. You will explore AWS tools for analysis, SQL queries, visualizations, and learn how AI and machine learning generate insights from data. Later chapters cover transactional data lakes, data meshes, and building a complete AWS data platform. By the end, you will be able to confidently implement data engineering pipelines on AWS. *Email sign-up and proof of purchase required

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Section 1: AWS Data Engineering Concepts and Trends

Free Chapter

An Introduction to Data Engineering

Technical requirements

The rise of big data as a corporate asset

The challenges of ever-growing datasets

The role of the data engineer as a big data enabler

The benefits of the cloud when building big data analytic solutions

Hands-on – creating and accessing your AWS account

Summary

Data Management Architectures for Analytics

Technical requirements

The evolution of data management for analytics

A deeper dive into data warehouse concepts and architecture

An overview of data lake architecture and concepts

Bringing together the best of data warehouses and data lakes

Hands-on – using the AWS Command Line Interface (CLI) to create Simple Storage Service (S3) buckets

Summary

The AWS Data Engineer’s Toolkit

Technical requirements

An overview of AWS services for ingesting data

An overview of AWS services for transforming data

An overview of AWS services for orchestrating big data pipelines

An overview of AWS services for consuming data

Hands-on – triggering an AWS Lambda function when a new file arrives in an S3 bucket

Summary

Data Governance, Security, and Cataloging

Technical requirements

The many different aspects of data governance

Data security, access, and privacy

Data quality, data profiling, and data lineage

Business and technical data catalogs

AWS services that help with data governance

Hands-on – configuring Lake Formation permissions

Summary

Section 2: Architecting and Implementing Data Engineering Pipelines and Transformations

Architecting Data Engineering Pipelines

Technical requirements

Approaching the data pipeline architecture

Identifying data consumers and understanding their requirements

Identifying data sources and ingesting data

Identifying data transformations and optimizations

Loading data into data marts

Wrapping up the whiteboarding session

Hands-on – architecting a sample pipeline

Summary

Ingesting Batch and Streaming Data

Technical requirements

Understanding data sources

Ingesting data from a relational database

Ingesting streaming data

Hands-on – ingesting data with AWS DMS

Hands-on – ingesting streaming data

Summary

Transforming Data to Optimize for Analytics

Technical requirements

Overview of how transformations can create value

Types of data transformation tools

Common data preparation transformations

Common business use case transformations

Working with Change Data Capture (CDC) data

Hands-on – joining datasets with AWS Glue Studio

Summary

Identifying and Enabling Data Consumers

Technical requirements

Understanding the impact of data democratization

Meeting the needs of business users with data visualization

Meeting the needs of data analysts with structured reporting

Meeting the needs of data scientists and ML models

Hands-on – creating data transformations with AWS Glue DataBrew

Summary

A Deeper Dive into Data Marts and Amazon Redshift

Technical requirements

Extending analytics with data warehouses/data marts

What not to do – anti-patterns for a data warehouse

Redshift architecture review and storage deep dive

Designing a high-performance data warehouse

Moving data between a data lake and Redshift

Exploring advanced Redshift features

Hands-on – deploying a Redshift Serverless cluster and running Redshift Spectrum queries

Summary

Orchestrating the Data Pipeline

Technical requirements

Understanding the core concepts for pipeline orchestration

Examining the options for orchestrating pipelines in AWS

Hands-on – orchestrating a data pipeline using AWS Step Functions

Summary

Section 3: The Bigger Picture: Data Analytics, Data Visualization, and Machine Learning

Ad Hoc Queries with Amazon Athena

Technical requirements

An introduction to Amazon Athena

Tips and tricks to optimize Amazon Athena queries

Exploring advanced Athena functionality

Managing groups of users with Amazon Athena workgroups

Hands-on – creating an Amazon Athena workgroup and configuring Athena settings

Hands-on – switching workgroups and running queries

Summary

Visualizing Data with Amazon QuickSight

Technical requirements

Representing data visually for maximum impact

Understanding Amazon QuickSight’s core concepts

Ingesting and preparing data from a variety of sources

Creating and sharing visuals with QuickSight analyses and dashboards

Understanding QuickSight’s advanced features

Hands-on – creating a simple QuickSight visualization

Summary

Enabling Artificial Intelligence and Machine Learning

Technical requirements

Understanding the value of AI and ML for organizations

Exploring AWS services for ML

Exploring AWS services for AI

Building generative AI solutions on AWS

Common use cases for LLMs

Hands-on – reviewing reviews with Amazon Comprehend

Summary

Section 4: Modern Strategies: Open Table Formats, Data Mesh, DataOps, and Preparing for the Real World

Building Transactional Data Lakes

Technical requirements

What does it mean for a data lake to be transactional?

An overview of Delta Lake, Apache Hudi, and Apache Iceberg

AWS service integrations for building transactional data lakes

Hands-on – Working with Apache Iceberg tables in AWS

Summary

Implementing a Data Mesh Strategy

Technical requirements

What is a data mesh?

Challenges that a data mesh approach attempts to resolve

The organizational and technical challenges of building a data mesh

AWS services that help enable a data mesh approach

A sample architecture for a data mesh on AWS

Hands-on – Setting up Amazon DataZone

Summary

Building a Modern Data Platform on AWS

Technical requirements

Goals of a modern data platform

Deciding whether to build or buy a data platform

DataOps as an approach to building data platforms

Hands-on – automated deployment of data platform components and data transformation code

Summary

Wrapping Up the First Part of Your Learning Journey

Technical requirements

Understanding the complexities of real-world data environments

Examining examples of real-world data pipelines

Imagining the future – a look at emerging trends

Hands-on – cleaning up your AWS account

Summary

Other Books You May Enjoy

Index

Data Engineering with AWS - Second Edition

By : Gareth Eagar

Data Engineering with AWS

By: Gareth Eagar

Overview of this book

Summary

Learn more on Discord

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access