-
Book Overview & Buying
-
Table Of Contents
Machine Learning Engineering on AWS - Second Edition
By :
Before training machine learning (ML) models, organizations may store, organize, and aggregate raw data from applications, databases, logs, and streaming systems into a centralized storage solution, such as a data warehouse or data lake. Nowadays, teams have the option to get the best of both worlds by building transactional data lake architectures using Amazon S3 Tables, which leverages open table formats such as Apache Iceberg to allow you to organize data into structured logical tables that behave similarly to traditional databases while preserving the scalability and flexibility of object storage. When working with large-scale data processing and transformation workloads, you can combine S3 Tables with distributed processing solutions such as Amazon Elastic MapReduce (EMR) and Apache Spark to clean, filter, aggregate, and engineer features from raw and semi-structured datasets before they are used for model training and...