Learn Amazon SageMaker

By : Julien Simon

Learn Amazon SageMaker

By: Julien Simon

Overview of this book

Amazon SageMaker enables you to quickly build, train, and deploy machine learning (ML) models at scale, without managing any infrastructure. It helps you focus on the ML problem at hand and deploy high-quality models by removing the heavy lifting typically involved in each step of the ML process. This book is a comprehensive guide for data scientists and ML developers who want to learn the ins and outs of Amazon SageMaker. You’ll understand how to use various modules of SageMaker as a single toolset to solve the challenges faced in ML. As you progress, you’ll cover features such as AutoML, built-in algorithms and frameworks, and the option for writing your own code and algorithms to build ML models. Later, the book will show you how to integrate Amazon SageMaker with popular deep learning libraries such as TensorFlow and PyTorch to increase the capabilities of existing models. You’ll also learn to get the models to production faster with minimum effort and at a lower cost. Finally, you’ll explore how to use Amazon SageMaker Debugger to analyze, detect, and highlight problems to understand the current model state and improve model accuracy. By the end of this Amazon book, you’ll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Introduction to Amazon SageMaker

Free Chapter

Chapter 1: Introduction to Amazon SageMaker

Technical requirements

Exploring the capabilities of Amazon SageMaker

Demonstrating the strengths of Amazon SageMaker

Setting up Amazon SageMaker on your local machine

Setting up an Amazon SageMaker notebook instance

Setting up Amazon SageMaker Studio

Summary

Chapter 2: Handling Data Preparation Techniques

Technical requirements

Discovering Amazon SageMaker Ground Truth

Exploring Amazon SageMaker Processing

Processing data with other AWS services

Summary

Section 2: Building and Training Models

Chapter 3: AutoML with Amazon SageMaker Autopilot

Technical requirements

Discovering Amazon SageMaker Autopilot

Using SageMaker Autopilot in SageMaker Studio

Using the SageMaker Autopilot SDK

Diving deep on SageMaker Autopilot

Summary

Chapter 4: Training Machine Learning Models

Technical requirements

Discovering the built-in algorithms in Amazon SageMaker

Training and deploying models with built-in algorithms

Using the SageMaker SDK with built-in algorithms

Working with more built-in algorithms

Summary

Chapter 5: Training Computer Vision Models

Technical requirements

Discovering the CV built-in algorithms in Amazon SageMaker

Preparing image datasets

Using the built-in CV algorithms

Summary

Chapter 6: Training Natural Language Processing Models

Technical requirements

Discovering the NLP built-in algorithms in Amazon SageMaker

Preparing natural language datasets

Using the built-in algorithms for NLP

Summary

Chapter 7: Extending Machine Learning Services Using Built-In Frameworks

Technical requirements

Discovering the built-in frameworks in Amazon SageMaker

Running your framework code on Amazon SageMaker

Using the built-in frameworks

Summary

Chapter 8: Using Your Algorithms and Code

Technical requirements

Understanding how SageMaker invokes your code

Using the SageMaker training toolkit with scikit-learn

Building a fully custom container for scikit-learn

Building a fully custom container for R

Training and deploying with XGBoost and MLflow

Training and deploying with XGBoost and Sagify

Summary

Section 3: Diving Deeper on Training

Chapter 9: Scaling Your Training Jobs

Technical requirements

Understanding when and how to scale

Streaming datasets with pipe mode

Using other storage services

Distributing training jobs

Training an Image Classification model on ImageNet

Summary

Chapter 10: Advanced Training Techniques

Technical requirements

Optimizing training costs with Managed Spot Training

Optimizing hyperparameters with Automatic Model Tuning

Exploring models with SageMaker Debugger

Summary

Section 4: Managing Models in Production

Chapter 11: Deploying Machine Learning Models

Technical requirements

Examining model artifacts

Managing real-time endpoints

Deploying batch transformers

Deploying inference pipelines

Monitoring predictions with Amazon SageMaker Model Monitor

Deploying models to container services

Summary

Chapter 12: Automating Machine Learning Workflows

Technical requirements

Automating with AWS CloudFormation

Automating with the AWS Cloud Development Kit

Automating with AWS Step Functions

Summary

Chapter 13: Optimizing Prediction Cost and Performance

Technical requirements

Autoscaling an endpoint

Deploying a multi-model endpoint

Deploying a model with Amazon Elastic Inference

Compiling models with Amazon SageMaker Neo

Building a cost optimization checklist

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Preparing natural language datasets

For the CV algorithms in the previous chapter, data preparation focused on the technical format required for the dataset (Image format, RecordIO, or augmented manifest). The images themselves weren't processed.

Things are quite different for NLP algorithms. Text needs to be heavily processed, converted, and saved in the right format. In most learning resources, these steps are abbreviated or even ignored. Data is already "automagically" ready for training, leaving the reader frustrated and sometimes dumbfounded on how to prepare their own datasets.

No such thing here! In this section, you'll learn how to prepare NLP datasets in different formats. Once again, get ready to learn a lot!

Let's start with preparing data for BlazingText.

Preparing data for classification with BlazingText

BlazingText expects labeled input data in the same format as FastText:

A plain text file, with one sample per line.

Learn Amazon SageMaker

By : Julien Simon

Learn Amazon SageMaker

By: Julien Simon

Overview of this book

Related Content you might be interested in

Current Title:

Learn Amazon SageMaker

Machine Learning with Amazon SageMaker Cookbook

Accelerate Deep Learning Workloads with Amazon SageMaker

Getting Started with Amazon SageMaker Studio

Preparing natural language datasets

Preparing data for classification with BlazingText