Machine Learning Engineering on AWS

By : Joshua Arvin Lat

Machine Learning Engineering on AWS

By: Joshua Arvin Lat

Overview of this book

There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production. This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS. By the end of this AWS book, you'll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Part 1: Getting Started with Machine Learning Engineering on AWS

Free Chapter

Chapter 1: Introduction to ML Engineering on AWS

Technical requirements

What is expected from ML engineers?

How ML engineers can get the most out of AWS

Essential prerequisites

Preparing the dataset

AutoML with AutoGluon

Getting started with SageMaker and SageMaker Studio

No-code machine learning with SageMaker Canvas

AutoML with SageMaker Autopilot

Summary

Further reading

Chapter 2: Deep Learning AMIs

Technical requirements

Getting started with Deep Learning AMIs

Launching an EC2 instance using a Deep Learning AMI

Downloading the sample dataset

Training an ML model

Loading and evaluating the model

Cleaning up

Understanding how AWS pricing works for EC2 instances

Summary

Further reading

Chapter 3: Deep Learning Containers

Technical requirements

Getting started with AWS Deep Learning Containers

Essential prerequisites

Using AWS Deep Learning Containers to train an ML model

Serverless ML deployment with Lambda’s container image support

Summary

Further reading

Part 2:Solving Data Engineering and Analysis Requirements

Chapter 4: Serverless Data Management on AWS

Technical requirements

Getting started with serverless data management

Preparing the essential prerequisites

Running analytics at scale with Amazon Redshift Serverless

Setting up Lake Formation

Using Amazon Athena to query data in Amazon S3

Summary

Further reading

Chapter 5: Pragmatic Data Processing and Analysis

Technical requirements

Getting started with data processing and analysis

Preparing the essential prerequisites

Automating data preparation and analysis with AWS Glue DataBrew

Preparing ML data with Amazon SageMaker Data Wrangler

Summary

Further reading

Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions

Chapter 6: SageMaker Training and Debugging Solutions

Technical requirements

Getting started with the SageMaker Python SDK

Preparing the essential prerequisites

Training an image classification model with the SageMaker Python SDK

Using the Debugger Insights Dashboard

Utilizing Managed Spot Training and Checkpoints

Cleaning up

Summary

Further reading

Chapter 7: SageMaker Deployment Solutions

Technical requirements

Getting started with model deployments in SageMaker

Preparing the pre-trained model artifacts

Preparing the SageMaker script mode prerequisites

Deploying a pre-trained model to a real-time inference endpoint

Deploying a pre-trained model to a serverless inference endpoint

Deploying a pre-trained model to an asynchronous inference endpoint

Cleaning up

Deployment strategies and best practices

Summary

Further reading

Part 4:Securing, Monitoring, and Managing Machine Learning Systems and Environments

Chapter 8: Model Monitoring and Management Solutions

Technical prerequisites

Registering models to SageMaker Model Registry

Deploying models from SageMaker Model Registry

Enabling data capture and simulating predictions

Scheduled monitoring with SageMaker Model Monitor

Analyzing the captured data

Deleting an endpoint with a monitoring schedule

Cleaning up

Summary

Further reading

Chapter 9: Security, Governance, and Compliance Strategies

Managing the security and compliance of ML environments

Preserving data privacy and model privacy

Establishing ML governance

Summary

Further reading

Part 5:Designing and Building End-to-end MLOps Pipelines

Chapter 10: Machine Learning Pipelines with Kubeflow on Amazon EKS

Technical requirements

Diving deeper into Kubeflow, Kubernetes, and EKS

Preparing the essential prerequisites

Setting up Kubeflow on Amazon EKS

Running our first Kubeflow pipeline

Using the Kubeflow Pipelines SDK to build ML workflows

Cleaning up

Summary

Further reading

Chapter 11: Machine Learning Pipelines with SageMaker Pipelines

Technical requirements

Diving deeper into SageMaker Pipelines

Preparing the essential prerequisites

Running our first pipeline with SageMaker Pipelines

Creating Lambda functions for deployment

Testing our ML inference endpoint

Completing the end-to-end ML pipeline

Cleaning up

Summary

Further reading

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction to ML Engineering on AWS

Most of us started our machine learning (ML) journey by training our first ML model using a sample dataset on our laptops or home computers. Things are somewhat straightforward until we need to work with much larger datasets and run our ML experiments in the cloud. It also becomes more challenging once we need to deploy our trained models to production-level inference endpoints or web servers. There are a lot of things to consider when designing and building ML systems and these are just some of the challenges data scientists and ML engineers face when working on real-life requirements. That said, we must use the right platform, along with the right set of tools, when performing ML experiments and deployments in the cloud.

At this point, you might be wondering why we should even use a cloud platform when running our workloads. Can’t we build this platform ourselves? Perhaps you might be thinking that building and operating your own data center is a relatively easy task. In the past, different teams and companies have tried setting up infrastructure within their data centers and on-premise hardware. Over time, these companies started migrating their workloads to the cloud as they realized how hard and expensive it was to manage and operate data centers. A good example of this would be the Netflix team, which migrated their resources to the AWS cloud. Migrating to the cloud allowed them to scale better and allowed them to have a significant increase in service availability.

The Amazon Web Services (AWS) platform provides a lot of services and capabilities that can be used by professionals and companies around the world to manage different types of workloads in the cloud. These past couple of years, AWS has announced and released a significant number of services, capabilities, and features that can be used for production-level ML experiments and deployments as well. This is due to the increase in ML workloads being migrated to the cloud globally. As we go through each of the chapters in this book, we will have a better understanding of how different services are used to solve the challenges when productionizing ML models.

The following diagram shows the hands-on journey for this chapter:

Figure 1.1 – Hands-on journey for this chapter

In this introductory chapter, we will focus on getting our feet wet by trying out different options when building an ML model on AWS. As shown in the preceding diagram, we will use a variety of AutoML services and solutions to build ML models that can help us predict if a hotel booking will be cancelled or not based on the information available. We will start by setting up a Cloud9 environment, which will help us run our code through an integrated development environment (IDE) in our browser. In this environment, we will generate a realistic synthetic dataset using a deep learning model called the Conditional Generative Adversarial Network. We will upload this dataset to Amazon S3 using the AWS CLI. Inside the Cloud9 environment, we will also install AutoGluon and run an AutoML experiment to train and generate multiple models using the synthetic dataset. Finally, we will use SageMaker Canvas and SageMaker Autopilot to run AutoML experiments using the uploaded dataset in S3. If you are wondering what these fancy terms are, keep reading as we demystify each of these in this chapter.

In this chapter, we will cover the following topics:

What is expected from ML engineers?
How ML engineers can get the most out of AWS
Essential prerequisites
Preparing the dataset
AutoML with AutoGluon
Getting started with SageMaker and SageMaker Canvas
No-code machine learning with SageMaker Canvas
AutoML with SageMaker Autopilot

In addition to getting our feet wet using key ML services, libraries, and tools to perform AutoML experiments, this introductory chapter will help us gain a better understanding of several ML and ML engineering concepts that will be relevant to the succeeding chapters of this book. With this in mind, let’s get started!

Machine Learning Engineering on AWS

By : Joshua Arvin Lat

Machine Learning Engineering on AWS

By: Joshua Arvin Lat

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Engineering on AWS

Machine Learning with Amazon SageMaker Cookbook

Building and Automating Penetration Testing Labs in the Cloud

Getting Started with Amazon SageMaker Studio

Introduction to ML Engineering on AWS