Modern Data Architecture on AWS

By : Behram Irani

5 (1)

Buy this Book

Modern Data Architecture on AWS

5 (1)

By: Behram Irani

Buy this Book

Overview of this book

Many IT leaders and professionals are adept at extracting data from a particular type of database and deriving value from it. However, designing and implementing an enterprise-wide holistic data platform with purpose-built data services, all seamlessly working in tandem with the least amount of manual intervention, still poses a challenge. This book will help you explore end-to-end solutions to common data, analytics, and AI/ML use cases by leveraging AWS services. The chapters systematically take you through all the building blocks of a modern data platform, including data lakes, data warehouses, data ingestion patterns, data consumption patterns, data governance, and AI/ML patterns. Using real-world use cases, each chapter highlights the features and functionalities of numerous AWS services to enable you to create a scalable, flexible, performant, and cost-effective modern data platform. By the end of this book, you’ll be equipped with all the necessary architectural patterns and be able to apply this knowledge to efficiently build a modern data platform for your organization using AWS services.

Preface

Who this book is for

What this book covers

To get the most out of this book

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Foundational Data Lake

Free Chapter

Prologue: The Data and Analytics Journey So Far

Introduction to the data and analytics journey

Traditional data platforms

Challenges with on-premises data systems

What this book is all about

Summary

Chapter 1: Modern Data Architecture on AWS

Data lakes

The role of a modern data architecture

Modern data architecture on AWS

Pillars of a modern data architecture

Summary

Chapter 2: Scalable Data Lakes

Why choose Amazon S3 as a data lake store?

Business scenario setup

Data lake layers

Data lake patterns

Data catalogs

Transactional data lakes

Putting it all together

Summary

Part 2: Purpose-Built Services And Unified Data Access

Chapter 3: Batch Data Ingestion

Database migration using AWS DMS

SaaS data ingestion using Amazon AppFlow

Data ingestion using AWS Glue

File and storage migration

Summary

References

Chapter 4: Streaming Data Ingestion

The need for streaming architectures and its challenges

Streaming data ingestion using Amazon Kinesis

Streaming data ingestion using Amazon MSK

Streaming services usage patterns

Summary

References

Chapter 5: Data Processing

Challenges with data processing platforms

Data processing using Amazon EMR

Data processing using AWS Glue

Data processing using AWS Glue DataBrew

Summary

References

Chapter 6: Interactive Analytics

Analytics using Amazon Athena

Analytics using Presto, Trino, and Hive on Amazon EMR

Summary

References

Chapter 7: Data Warehousing

The need for a data warehouse

Data warehousing using Amazon Redshift

Data warehouse modernization using Redshift

Data ingestion patterns

Data transformation using ELT patterns

Data security and governance patterns

Data consumption patterns

Summary

References

Chapter 8: Data Sharing

Internal data sharing

External data sharing

Summary

References

Chapter 9: Data Federation

Data federation using Amazon Athena

Data federation using Amazon Redshift

Summary

References

Chapter 10: Predictive Analytics

Role of AI/ML in predictive analytics

Barriers to AI/ML adoption

AWS AI/ML services overview

AWS AI services, along with use cases

ML using Amazon SageMaker, along with use cases

ML using Amazon Redshift and Amazon Athena

Summary

References

Chapter 11: Generative AI

How does generative AI help different industries?

Fundamentals of generative AI

Generative AI on AWS

Analytics use case with GenAI

Summary

References

Chapter 12: Operational Analytics

Amazon OpenSearch Service

Amazon OpenSearch Service use cases

Summary

References

Chapter 13: Business Intelligence

Amazon QuickSight

Amazon QuickSight use-cases

Summary

References

Part 3: Govern, Scale, Optimize And Operationalize

Chapter 14: Data Governance

What is data governance?

Data governance on AWS

Data governance using Amazon DataZone

Fine-grained access control using AWS Lake Formation

Improving data quality using Glue Data Quality

Sensitive data discovery with Amazon Macie

Data collaborations with partners using AWS Clean Rooms

Data resolution with AWS Entity Resolution

Summary

References

Chapter 15: Data Mesh

Data mesh concepts

Data mesh on AWS

Data mesh on an Amazon S3-based data lake

Data mesh on Amazon Redshift

Summary

References

Chapter 16: Performant and Cost-Effective Data Platform

Why does a performant and cost-effective data platform matter?

Data storage optimizations

Compute resource optimizations

Cost optimization tools

Tool-specific performance tuning

Summary

References

Chapter 17: Automate, Operationalize, and Monetize

The need for automation

Wrap-up

Summary

References

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 (1)

5 star

100%

4 star

3 star

2 star

1 star

Challenges with on-premises data systems

As data grew exponentially, so did the on-premises systems. However, visible cracks started to appear in the legacy way of architecting data and analytics use cases.

The hardware that was used to process, store, and consume data had to be procured up-front, and then installed and configured before it was ready for use. So, there was operational overhead and risks associated with procuring the hardware, provisioning it, installing software, and maintaining the system all the time. Also, to accommodate for future data growth, people had to estimate additional capacity way in advance. The concept of hardware elasticity didn’t exist. The lack of elasticity in hardware meant that there were scalability risks associated with the systems in place, and these risks would surface whenever there was a sudden growth in the volume of data or when there was a market expansion for the business.

Buying all this extra hardware up-front also meant that a huge capital expenditure investment had to be made for the hardware, with all the extra capacity lying unused from time to time. Also, software licenses had to be paid for and those were expensive, adding to the overall IT costs. Even after buying all the hardware upfront, it was difficult to maintain the data platform’s high performance all the time. As data volumes grew, latency started creeping in, which adversely affected the performance of certain critical systems.

As data grew into big data, the type of data produced was not just structured data; a lot of business use cases required semi-structured data, such as JSON files, and even unstructured data, such as images and PDF files. In subsequent chapters, we will go through some use cases that specify different types of data.

As the sources of data grew, so did the number of ETL pipelines. Managing these pipelines became cumbersome. And on top of that, with so much data movement, data started to duplicate at multiple places, which made it difficult to create a single source of truth for the data.

On the flip side, with so many data sources and data owners within an organization, data became siloed, which made it difficult to share across different LOBs in the organization.

Most of the enterprise data was either stored in an OLTP system such as an RDBMS or an OLAP system such as a data warehouse. What this meant was that organizations tried to solve most of their new use cases using the systems they had invested so heavily in. The challenge was that these systems were built and optimized for specific types of operations only. Soon, it became evident that to solve other types of data and analytics use cases, specific types of systems were needed to be in place, to meet the performance requirements.

Lastly, as businesses started to expand in other geographies, these systems needed to be expanded to other locations. And a lot of time, effort, and money was spent scaling the data platform and making it resilient in case of failures.

Modern Data Architecture on AWS

By : Behram Irani

Modern Data Architecture on AWS

By: Behram Irani

Overview of this book

Related Content you might be interested in

Current Title:

Modern Data Architecture on AWS

Data Engineering with AWS

Data Engineering with AWS

AWS for Solutions Architects

Challenges with on-premises data systems