Book Image

Learn Amazon SageMaker

By : Julien Simon
Book Image

Learn Amazon SageMaker

By: Julien Simon

Overview of this book

Amazon SageMaker enables you to quickly build, train, and deploy machine learning (ML) models at scale, without managing any infrastructure. It helps you focus on the ML problem at hand and deploy high-quality models by removing the heavy lifting typically involved in each step of the ML process. This book is a comprehensive guide for data scientists and ML developers who want to learn the ins and outs of Amazon SageMaker. You’ll understand how to use various modules of SageMaker as a single toolset to solve the challenges faced in ML. As you progress, you’ll cover features such as AutoML, built-in algorithms and frameworks, and the option for writing your own code and algorithms to build ML models. Later, the book will show you how to integrate Amazon SageMaker with popular deep learning libraries such as TensorFlow and PyTorch to increase the capabilities of existing models. You’ll also learn to get the models to production faster with minimum effort and at a lower cost. Finally, you’ll explore how to use Amazon SageMaker Debugger to analyze, detect, and highlight problems to understand the current model state and improve model accuracy. By the end of this Amazon book, you’ll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.
Table of Contents (19 chapters)
Section 1: Introduction to Amazon SageMaker
Section 2: Building and Training Models
Section 3: Diving Deeper on Training
Section 4: Managing Models in Production

Exploring the capabilities of Amazon SageMaker

Amazon SageMaker was launched at AWS re:Invent 2017. Since then, a lot of new features have been added: you can see the full (and ever-growing) list at

In this section, you'll learn about the main capabilities of Amazon SageMaker and their purpose. Don't worry, we'll dive deep on each of them in later chapters. We will also talk about the SageMaker Application Programming Interfaces (APIs), and the Software Development Kits (SDKs) that implement them.

The main capabilities of Amazon SageMaker

At the core of Amazon SageMaker is the ability to build, train, optimize, and deploy models on fully managed infrastructure, and at any scale. This lets you focus on studying and solving the ML problem at hand, instead of spending time and resources on building and managing infrastructure. Simply put, you can go from building to training to deploying more quickly. Let's zoom in on each step and highlight relevant SageMaker capabilities.


Amazon SageMaker provides you with two development environments:

  • Notebook instances: Fully managed Amazon EC2 instances that come preinstalled with the most popular tools and libraries: Jupyter, Anaconda, and so on.
  • Amazon SageMaker Studio: A full-fledged integrated development environment for ML projects.

When it comes to experimenting with algorithms, you can choose from the following:

  • A collection of 17 built-in algorithms for ML and deep learning, already implemented and optimized to run efficiently on AWS. No ML code to write!
  • A collection of built-in open source frameworks (TensorFlow, PyTorch, Apache MXNet, scikit-learn, and more), where you simply bring your own code.
  • Your own code running in your own container: custom Python, R, C++, Java, and so on.
  • Algorithms and pretrained models from AWS Marketplace for ML (

In addition, Amazon SageMaker Autopilot uses AutoML to automatically build, train, and optimize models without the need to write a single line of ML code.

Amazon SageMaker also includes two major capabilities that help with building and preparing datasets:

  • Amazon SageMaker Ground Truth: Annotate datasets at any scale. Workflows for popular use cases are built in (image detection, entity extraction, and more), and you can implement your own. Annotation jobs can be distributed to workers that belong to private, third-party, or public workforces.
  • Amazon SageMaker Processing: Run data processing and model evaluation batch jobs, using either scikit-learn or Spark.


As mentioned earlier, Amazon SageMaker takes care of provisioning and managing your training infrastructure. You'll never spend any time managing servers, and you'll be able to focus on ML. On top of this, SageMaker brings advanced capabilities such as the following:

  • Managed storage using either Amazon S3, Amazon EFS, or Amazon FSx for Lustre depending on your performance requirements.
  • Managed spot training, using Amazon EC2 Spot instances for training in order to reduce costs by up to 80%.
  • Distributed training automatically distributes large-scale training jobs on a cluster of managed instances
  • Pipe mode streams infinitely large datasets from Amazon S3 to the training instances, saving the need to copy data around.
  • Automatic model tuning runs hyperparameter optimization in order to deliver high-accuracy models more quickly.
  • Amazon SageMaker Experiments easily tracks, organizes, and compares all your SageMaker jobs.
  • Amazon SageMaker Debugger captures the internal model state during training, inspects it to observe how the model learns, and detects unwanted conditions that hurt accuracy.


Just as with training, Amazon SageMaker takes care of all your deployment infrastructure, and brings a slew of additional features:

  • Real-time endpoints: This creates an HTTPS API that serves predictions from your model. As you would expect, autoscaling is available.
  • Batch transform: This uses a model to predict data in batch mode.
  • Infrastructure monitoring with Amazon CloudWatch: This helps you to view real-time metrics and keep track of infrastructure performance.
  • Amazon SageMaker Model Monitor: This captures data sent to an endpoint, and compares it with a baseline to identify and alert on data quality issues (missing features, data drift, and more).
  • Amazon SageMaker Neo: This compiles models for a specific hardware architecture, including embedded platforms, and deploys an optimized version using a lightweight runtime.
  • Amazon Elastic Inference: This adds fractional GPU acceleration to CPU-based instances in order to find the best cost/performance ratio for your prediction infrastructure.

The Amazon SageMaker API

Just like all other AWS services, Amazon SageMaker is driven by APIs that are implemented in the language SDKs supported by AWS ( In addition, a dedicated Python SDK, aka the 'SageMaker SDK,' is also available. Let's look at both, and discuss their respective benefits.

The AWS language SDKs

Language SDKs implement service-specific APIs for all AWS services: S3, EC2, and so on. Of course, they also include SageMaker APIs, which are documented at

When it comes to data science and ML, Python is the most popular language, so let's take a look at the SageMaker APIs available in boto3, the AWS SDK for the Python language ( These APIs are quite low level and verbose: for example, create_training_job() has a lot of JSON parameters that don't look very obvious. You can see some of them in the next screenshot. You may think that this doesn't look very appealing for everyday ML experimentation… and I would totally agree!

Figure 1.1 A partial view of the create_training_job() API in boto3

Figure 1.1 A partial view of the create_training_job() API in boto3

Indeed, these service-level APIs are not meant to be used for experimentation in notebooks. Their purpose is automation, through either bespoke scripts or Infrastructure-as-Code tools such as AWS CloudFormation ( and Terraform ( Your DevOps team will use them to manage production, where they do need full control over each possible parameter.

So, what should you use for experimentation? You should use the Amazon SageMaker SDK.

The Amazon SageMaker SDK

The Amazon SageMaker SDK ( is a Python SDK specific to Amazon SageMaker. You can find its documentation at


The code examples in this book are based on the first release of the SageMaker SDK v2, released in August 2020. For the sake of completeness, and to help you migrate your own notebooks, the companion GitHub repository includes examples for SDK v1 and v2.

Here, the abstraction level is much higher: the SDK contains objects for models, estimators, models, predictors, and so on. We're definitely back into ML territory.

For instance, this SDK makes it extremely easy and comfortable to fire up a training job (one line of code) and to deploy a model (one line of code). Infrastructure concerns are abstracted away, and we can focus on ML instead. Here's an example. Don't worry about the details for now:

# Configure the training job my_estimator = TensorFlow(    '',    role=my_sageMaker_role,    instance_type='ml.p3.2xlarge',    instance_count=1,    framework_version='2.1.0')
# Train the model's3://my_bucket/my_training_data/')
# Deploy the model to an HTTPS endpoint my_predictor = my_estimator.deploy(    initial_instance_count=1,     instance_type='ml.c5.2xlarge')

Now that we know a little more about Amazon SageMaker, let's see how it helps typical customers make their ML workflows more agile and more efficient.