Machine Learning Engineering on AWS

By : Joshua Arvin Lat

Machine Learning Engineering on AWS

By: Joshua Arvin Lat

Overview of this book

There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production. This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS. By the end of this AWS book, you'll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Part 1: Getting Started with Machine Learning Engineering on AWS

Free Chapter

Chapter 1: Introduction to ML Engineering on AWS

Technical requirements

What is expected from ML engineers?

How ML engineers can get the most out of AWS

Essential prerequisites

Preparing the dataset

AutoML with AutoGluon

Getting started with SageMaker and SageMaker Studio

No-code machine learning with SageMaker Canvas

AutoML with SageMaker Autopilot

Summary

Further reading

Chapter 2: Deep Learning AMIs

Technical requirements

Getting started with Deep Learning AMIs

Launching an EC2 instance using a Deep Learning AMI

Downloading the sample dataset

Training an ML model

Loading and evaluating the model

Cleaning up

Understanding how AWS pricing works for EC2 instances

Summary

Further reading

Chapter 3: Deep Learning Containers

Technical requirements

Getting started with AWS Deep Learning Containers

Essential prerequisites

Using AWS Deep Learning Containers to train an ML model

Serverless ML deployment with Lambda’s container image support

Summary

Further reading

Part 2:Solving Data Engineering and Analysis Requirements

Chapter 4: Serverless Data Management on AWS

Technical requirements

Getting started with serverless data management

Preparing the essential prerequisites

Running analytics at scale with Amazon Redshift Serverless

Setting up Lake Formation

Using Amazon Athena to query data in Amazon S3

Summary

Further reading

Chapter 5: Pragmatic Data Processing and Analysis

Technical requirements

Getting started with data processing and analysis

Preparing the essential prerequisites

Automating data preparation and analysis with AWS Glue DataBrew

Preparing ML data with Amazon SageMaker Data Wrangler

Summary

Further reading

Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions

Chapter 6: SageMaker Training and Debugging Solutions

Technical requirements

Getting started with the SageMaker Python SDK

Preparing the essential prerequisites

Training an image classification model with the SageMaker Python SDK

Using the Debugger Insights Dashboard

Utilizing Managed Spot Training and Checkpoints

Cleaning up

Summary

Further reading

Chapter 7: SageMaker Deployment Solutions

Technical requirements

Getting started with model deployments in SageMaker

Preparing the pre-trained model artifacts

Preparing the SageMaker script mode prerequisites

Deploying a pre-trained model to a real-time inference endpoint

Deploying a pre-trained model to a serverless inference endpoint

Deploying a pre-trained model to an asynchronous inference endpoint

Cleaning up

Deployment strategies and best practices

Summary

Further reading

Part 4:Securing, Monitoring, and Managing Machine Learning Systems and Environments

Chapter 8: Model Monitoring and Management Solutions

Technical prerequisites

Registering models to SageMaker Model Registry

Deploying models from SageMaker Model Registry

Enabling data capture and simulating predictions

Scheduled monitoring with SageMaker Model Monitor

Analyzing the captured data

Deleting an endpoint with a monitoring schedule

Cleaning up

Summary

Further reading

Chapter 9: Security, Governance, and Compliance Strategies

Managing the security and compliance of ML environments

Preserving data privacy and model privacy

Establishing ML governance

Summary

Further reading

Part 5:Designing and Building End-to-end MLOps Pipelines

Chapter 10: Machine Learning Pipelines with Kubeflow on Amazon EKS

Technical requirements

Diving deeper into Kubeflow, Kubernetes, and EKS

Preparing the essential prerequisites

Setting up Kubeflow on Amazon EKS

Running our first Kubeflow pipeline

Using the Kubeflow Pipelines SDK to build ML workflows

Cleaning up

Summary

Further reading

Chapter 11: Machine Learning Pipelines with SageMaker Pipelines

Technical requirements

Diving deeper into SageMaker Pipelines

Preparing the essential prerequisites

Running our first pipeline with SageMaker Pipelines

Creating Lambda functions for deployment

Testing our ML inference endpoint

Completing the end-to-end ML pipeline

Cleaning up

Summary

Further reading

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

AutoML with AutoGluon

Previously, we discussed what hyperparameters are. When training and tuning ML models, it is important for us to know that the performance of an ML model depends on the algorithm, the training data, and the hyperparameter configuration that’s used when training the model. Other input configuration parameters may also affect the performance of the model, but we’ll focus on these three for now. Instead of training a single model, teams build multiple models using a variety of hyperparameter configurations. Changes and tweaks in the hyperparameter configuration affect the performance of a model – some lead to better performance, while others lead to worse performance. It takes time to try out all possible combinations of hyperparameter configurations, especially if the model tuning process is not automated.

These past couple of years, several libraries, frameworks, and services have allowed teams to make the most out of automated machine learning (AutoML) to automate different parts of the ML process. Initially, AutoML tools focused on automating the hyperparameter optimization (HPO) processes to obtain the optimal combination of hyperparameter values. Instead of spending hours (or even days) manually trying different combinations of hyperparameters when running training jobs, we’ll just need to configure, run, and wait for this automated program to help us find the optimal set of hyperparameter values. For years, several tools and libraries that focus on automated hyperparameter optimization were available for ML practitioners for use. After a while, other aspects and processes of the ML workflow were automated and included in the AutoML pipeline.

There are several tools and services available for AutoML and one of the most popular options is AutoGluon. With AutoGluon, we can train multiple models using different algorithms and evaluate them with just a few lines of code:

Figure 1.12 – AutoGluon leaderboard – models trained using a variety of algorithms

Similar to what is shown in the preceding screenshot, we can also compare the generated models using a leaderboard. In this chapter, we’ll use AutoGluon with a tabular dataset. However, it is important to note that AutoGluon also supports performing AutoML tasks for text and image data.

Setting up and installing AutoGluon

Before using AutoGluon, we need to install it. It should take a minute or so to complete the installation process:

Run the following commands in the terminal to install and update the prerequisites before we install AutoGluon:

python3 -m pip install -U "mxnet<2.0.0"
python3 -m pip install numpy
python3 -m pip install cython
python3 -m pip install pyOpenSSL --upgrade

This book assumes that you are using the following versions or later: mxnet – 1.9.0, numpy – 1.19.5, and cython – 0.29.26.

Next, run the following command to install autogluon:
```
python3 -m pip install autogluon
```

This book assumes that you are using autogluon version 0.3.1 or later.

Important note

This step may take around 5 to 10 minutes to complete. Feel free to grab a cup of coffee or tea!

With AutoGluon installed in our Cloud9 environment, let’s proceed with our first AutoGluon AutoML experiment.

Performing your first AutoGluon AutoML experiment

If you have used scikit-learn or other ML libraries and frameworks before, using AutoGluon should be easy and fairly straightforward since it uses a very similar set of methods, such as fit() and predict(). Follow these steps:

To start, run the following command in the terminal:
```
ipython
```

This will open the IPython Read-Eval-Print-Loop (REPL)/interactive shell. We will use this similar to how we use the Python shell.

Inside the console, type in (or copy) the following block of code. Make sure that you press Enter after typing the closing parenthesis:
```
from autogluon.tabular import (
    TabularDataset,
    TabularPredictor
)
```
Now, let’s load the synthetic data stored in the bookings.train.csv and bookings.test.csv files into the train_data and test_data variables, respectively, by running the following statements:
```
train_loc = 'tmp/bookings.train.csv'
test_loc = 'tmp/bookings.test.csv'
train_data = TabularDataset(train_loc)
test_data = TabularDataset(test_loc)
```

Since the parent class of AutoGluon, TabularDataset, is a pandas DataFrame, we can use different methods on train_data and test_data such as head(), describe(), memory_usage(), and more.

Next, run the following lines of code:

label = 'is_cancelled'
save_path = 'tmp'
tp = TabularPredictor(label=label, path=save_path)
predictor = tp.fit(train_data)

Here, we specify is_cancelled as the target variable of the AutoML task and the tmp directory as the location where the generated models will be stored. This block of code will use the training data we have provided to train multiple models using different algorithms. AutoGluon will automatically detect that we are dealing with a binary classification problem and generate multiple binary classifier models using a variety of ML algorithms.

Important note

Inside the tmp/models directory, we should find CatBoost, ExtraTreesEntr, and ExtraTreesGini, along with other directories corresponding to the algorithms used in the AutoML task. Each of these directories contains a model.pkl file that contains the serialized model. Why do we have multiple models? Behind the scenes, AutoGluon runs a significant number of training experiments using a variety of algorithms, along with different combinations of hyperparameter values, to produce the “best” model. The “best” model is selected using a certain evaluation metric that helps identify which model performs better than the rest. For example, if the evaluation metric that’s used is accuracy, then a model with an accuracy score of 90% (which gets 9 correct answers every 10 tries) is “better” than a model with an accuracy score of 80% (which gets 8 correct answers every 10 tries). That said, once the models have been generated and evaluated, AutoGluon simply chooses the model with the highest evaluation metric value (for example, accuracy) and tags it as the “best model.”

Now that we have our “best model” ready, what do we do next? The next step is for us to evaluate the “best model” using the test dataset. That said, let’s prepare the test dataset for inference by removing the target label:
```
y_test = test_data[label]
test_data_no_label = test_data.drop(columns=[label])
```
With everything ready, let’s use the predict() method to predict the is_cancelled column value of the test dataset provided as the payload:
```
y_pred = predictor.predict(test_data_no_label)
```
Now that we have the actual y values (y_test) and the predicted y values (y_pred), let’s quickly check the performance of the trained model by using the evaluate_predictions() method:
```
predictor.evaluate_predictions(
    y_true=y_test, 
    y_pred=y_pred, 
    auxiliary_metrics=True
)
```

The previous block of code should yield performance metric values similar to the following:

{'accuracy': 0.691...,
 'balanced_accuracy': 0.502...,
 'mcc': 0.0158...,
 'f1': 0.0512...,
 'precision': 0.347...,
 'recall': 0.0276...}

In this step, we compare the actual values with the predicted values for the target column using a variety of formulas that compare how close these values are to each other. Here, the goal of the trained models is to make “the least number of mistakes” as possible over unseen data. Better models generally have better scores for performance metrics such as accuracy, Matthews correlation coefficient (MCC), and F1-score. We won’t go into the details of how model performance metrics work here. Feel free to check out https://bit.ly/3zn2crv for more information.

Now that we are done with our quick experiment, let’s exit the IPython shell:
```
exit()
```

There’s more we can do using AutoGluon but this should help us appreciate how easy it is to use AutoGluon for AutoML experiments. There are other methods we can use, such as leaderboard(), get_model_best(), and feature_importance(), so feel free to check out https://auto.gluon.ai/stable/index.html for more information.

Machine Learning Engineering on AWS

By : Joshua Arvin Lat

Machine Learning Engineering on AWS

By: Joshua Arvin Lat

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Engineering on AWS

Machine Learning with Amazon SageMaker Cookbook

Building and Automating Penetration Testing Labs in the Cloud

Getting Started with Amazon SageMaker Studio

AutoML with AutoGluon

Setting up and installing AutoGluon

Performing your first AutoGluon AutoML experiment