Book Image

Applied Machine Learning and High-Performance Computing on AWS

By : Mani Khanuja, Farooq Sabir, Shreyas Subramanian, Trenton Potgieter
Book Image

Applied Machine Learning and High-Performance Computing on AWS

By: Mani Khanuja, Farooq Sabir, Shreyas Subramanian, Trenton Potgieter

Overview of this book

Machine learning (ML) and high-performance computing (HPC) on AWS run compute-intensive workloads across industries and emerging applications. Its use cases can be linked to various verticals, such as computational fluid dynamics (CFD), genomics, and autonomous vehicles. This book provides end-to-end guidance, starting with HPC concepts for storage and networking. It then progresses to working examples on how to process large datasets using SageMaker Studio and EMR. Next, you’ll learn how to build, train, and deploy large models using distributed training. Later chapters also guide you through deploying models to edge devices using SageMaker and IoT Greengrass, and performance optimization of ML models, for low latency use cases. By the end of this book, you’ll be able to build, train, and deploy your own large-scale ML application, using HPC on AWS, following industry best practices and addressing the key pain points encountered in the application life cycle.
Table of Contents (20 chapters)
1
Part 1: Introducing High-Performance Computing
6
Part 2: Applied Modeling
13
Part 3: Driving Innovation Across Industries

The high availability of model endpoints

Amazon SageMaker provides fault tolerance and high availability of the deployed endpoints. In this section, we will discuss various features and options of AWS cloud infrastructure and Amazon SageMaker, that we can use to ensure that our endpoints are fault-tolerant, resilient, and highly available.

Deployment on multiple instances

SageMaker gives us the option of deploying our endpoints on multiple instances. This protects from instance failures. If one instance goes down, then other instances can still serve the inference requests. In addition, if our endpoints are deployed on multiple instances and an availability zone outage occurs or an instance fails, SageMaker automatically tries to distribute our instances across different availability zones, thereby improving the resiliency of our endpoints. It is also a good practice to deploy our endpoints using small instance types spread across different availability zones.

Endpoints autoscaling...