Book Image

Applied Machine Learning and High-Performance Computing on AWS

By : Mani Khanuja, Farooq Sabir, Shreyas Subramanian, Trenton Potgieter
Book Image

Applied Machine Learning and High-Performance Computing on AWS

By: Mani Khanuja, Farooq Sabir, Shreyas Subramanian, Trenton Potgieter

Overview of this book

Machine learning (ML) and high-performance computing (HPC) on AWS run compute-intensive workloads across industries and emerging applications. Its use cases can be linked to various verticals, such as computational fluid dynamics (CFD), genomics, and autonomous vehicles. This book provides end-to-end guidance, starting with HPC concepts for storage and networking. It then progresses to working examples on how to process large datasets using SageMaker Studio and EMR. Next, you’ll learn how to build, train, and deploy large models using distributed training. Later chapters also guide you through deploying models to edge devices using SageMaker and IoT Greengrass, and performance optimization of ML models, for low latency use cases. By the end of this book, you’ll be able to build, train, and deploy your own large-scale ML application, using HPC on AWS, following industry best practices and addressing the key pain points encountered in the application life cycle.
Table of Contents (20 chapters)
1
Part 1: Introducing High-Performance Computing
6
Part 2: Applied Modeling
13
Part 3: Driving Innovation Across Industries

Reviewing the AWS services for data analysis

AWS provides multiple services that are geared to help the data scientist analyze either structured, semi-structured, or unstructured data at scale. A common style across all these services is to provide users with the flexibility of choice to match the right aspects of each service as it applies to the use case. At times, it may seem confusing to the user which service to leverage for their use case.

Thus, in this section, we will map some of the AWS capabilities to the methodologies we’ve reviewed in the previous section.

Unifying the data into a common store

To address the requirement of storing the relevant global population of data from multiple sources in a common store, AWS provides the Amazon Simple Storage Service (S3) object storage service, allowing users to store structured, semi-structured, and unstructured data as objects within buckets.

Note

If you are unfamiliar with the S3 service, how it works, and...