Getting Started with Data Analytics | Data Engineering on AWS

Book Overview & Buying
Table Of Contents

Data Engineering on AWS - The Complete Training

By : Ashish Prajapati

Buy this Video

Data Engineering on AWS - The Complete Training

By: Ashish Prajapati

Buy this Video

Overview of this book

This course begins by laying the foundation of data analytics and introducing AWS data engineering services. You’ll start with AWS Glue, learning to catalog, transform, and manage data using workflows, job bookmarks, and quality checks, followed by visual data preparation with Glue Databrew. Next, you’ll move into data warehousing with Amazon Redshift, from cluster creation to serverless deployment and performance tuning. As the journey continues, the focus shifts to real-time data processing with Amazon Kinesis and MSK, covering stream management, Flink applications, and Kafka integration. You’ll then explore big data processing using Amazon EMR, understanding MapReduce, Spark, and cost-effective serverless execution. The course then guides you through building data lakes using AWS Lake Formation and querying them efficiently with Amazon Athena. In the final stages, you’ll visualize data using Amazon QuickSight and orchestrate pipelines through Step Functions and AppFlow. You’ll also gain experience with AWS data migration tools like DMS and DataSync. The course concludes with extended AWS services including Lambda, S3, EC2, and DynamoDB, empowering you to design and manage complete, scalable data platforms in the cloud.

Introduction: Data Is the New Oil

Introduction

Know Your Trainer

Getting Started with Data Analytics

Data Engineering on AWS

Basic Terminologies

AWS Glue: Catalog and Process Your Data

Glue Data Catalog

Glue ETL: Part 1

Glue ETL: Part 2

Glue ETL: Part 3

Workflows

Job Bookmark

Execution Type

Data Quality: Part 1

Data Quality: Part 2

Glue DataBrew

Additional Features

Amazon Redshift: A Data Warehouse in AWS

Amazon Redshift

Architecture

Creating a Cluster

Query Editor v2

Distribution Styles

Cluster Operations

Data API

Redshift Spectrum

Redshift Serverless: Part 1

Redshift Serverless: Part 2

Materialized Views

WLM and Concurrency

DataShare

Additional Information

Processing Streaming Data on Amazon Kinesis and Amazon MSK

What Is Streaming Data?

Streaming Services in AWS

Amazon Kinesis Family

Amazon Kinesis Data Streams

Capacity Mode

Shard Iterators

Kinesis Data Generator

Data Stream Producers

Data Stream Consumer

Enhanced Fan-Out

Amazon Kinesis Firehose

Dynamic Partitioning

Data Stream vs. Data Firehose

Managed Service for Apache Flink

Flink Application

Flink Studio

Apache Kafka

Amazon Managed Service for Kafka

MSK Cluster

Kafka Topic

Send and Receive Messages

Amazon MSK Serverless

MSK Provisioned vs. Serverless

Amazon MSK Connect

Amazon Kinesis vs. Amazon MSK

Running Big Data Workloads on Amazon EMR

What Is Big Data?

MapReduce

Big Data Ecosystem

Amazon EMR

Storage for EMR

Creating EMR Cluster: Part 1

Creating EMR Cluster: Part 2

Migration

Amazon EMR Serverless

Cost Optimization

Building Data Lakes on AWS

What Is a Data Lake?

Data Warehouse vs. Data Lake

AWS Lake Formation

How It Works?

Setting Up a Data Lake: Part 1

Setting Up a Data Lake: Part 2

Data Lake Permissions

Tag-Based Permissions

Open Table Formats

Query Your Data Using Amazon Athena

Why Use Amazon Athena?

How It Works?

Optimizing Queries in Athena: Part 1

Optimizing Queries in Athena: Part 2

Workgroups

Federated Query: Part 1

Federated Query: Part 2

Visualize Your Data Using Amazon QuickSight

Data Visualization

Getting Started

Integration with Amazon Athena

Orchestrating Your Data Pipeline

Which One to Choose?

Amazon AppFlow

AWS Data Exchange

AWS Step Functions: Part 1

AWS Step Functions: Part 2

AWS Step Functions: Part 3

Data Migration Services in AWS

A Note Before You Proceed

Migrating Your Databases to AWS

AWS DMS: Part 1

AWS DMS: Part 2

AWS DMS: Part 3

Migrating Your Data to AWS

AWS DataSync: Part 1

AWS DataSync: Part 2

AWS Transfer Family: Part 1

AWS Transfer Family: Part 2

AWS Snow Family

Going Beyond AWS Analytics Services

A Note Before You Proceed

AWS Lambda: Important Features

AWS Lambda: Concurrency

AWS Lambda: Layers

AWS Lambda: VPC Connectivity

AWS Lambda: Permissions

Amazon S3: Encryption

Amazon S3: Glacier Storage Tiers

Amazon S3: Events Notification

Amazon S3: Lifecycle Policies

Amazon S3: Intelligent Tier

Amazon S3: Select

Amazon S3: Object Lambda

Amazon EC2: EBS Volumes

Amazon RDS: Performance Insights

Amazon DynamoDB: Streams

Amazon DynamoDB: Global Tables

Amazon DynamoDB: Auto Scaling

Final Note

Data Engineering on AWS - The Complete Training

By : Ashish Prajapati

Data Engineering on AWS - The Complete Training

By: Ashish Prajapati

Overview of this book

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access