Book Image

Learn Amazon SageMaker

By : Julien Simon
Book Image

Learn Amazon SageMaker

By: Julien Simon

Overview of this book

Amazon SageMaker enables you to quickly build, train, and deploy machine learning (ML) models at scale, without managing any infrastructure. It helps you focus on the ML problem at hand and deploy high-quality models by removing the heavy lifting typically involved in each step of the ML process. This book is a comprehensive guide for data scientists and ML developers who want to learn the ins and outs of Amazon SageMaker. You’ll understand how to use various modules of SageMaker as a single toolset to solve the challenges faced in ML. As you progress, you’ll cover features such as AutoML, built-in algorithms and frameworks, and the option for writing your own code and algorithms to build ML models. Later, the book will show you how to integrate Amazon SageMaker with popular deep learning libraries such as TensorFlow and PyTorch to increase the capabilities of existing models. You’ll also learn to get the models to production faster with minimum effort and at a lower cost. Finally, you’ll explore how to use Amazon SageMaker Debugger to analyze, detect, and highlight problems to understand the current model state and improve model accuracy. By the end of this Amazon book, you’ll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.
Table of Contents (19 chapters)
1
Section 1: Introduction to Amazon SageMaker
4
Section 2: Building and Training Models
11
Section 3: Diving Deeper on Training
14
Section 4: Managing Models in Production

Demonstrating the strengths of Amazon SageMaker

Alice and Bob are both passionate, hardworking people who try their best to build great ML solutions. Unfortunately, a lot of things stand in their way and slow them down.

In this section, let's look at the challenges that they face in their daily projects, and how Amazon SageMaker could help them be more productive.

Solving Alice's problems

Alice has a PhD and works in a large public research lab. She's a trained data scientist, with a strong background in math and statistics. She spends her time on large scientific projects involving bulky datasets. Alice generally doesn't know much about IT and infrastructure, and she honestly doesn't care at all for these topics. Her focus is on advancing her research, and publishing papers.

For her daily work, she can rely on her own powerful (but expensive) desktop workstation. She enjoys the fact that she can work on her own, but she can only experiment with a fraction of her dataset if she wants to keep training times reasonable.

She tries to maintain the software configuration of her machine herself, as IT doesn't know much about the esoteric tools she uses. When something goes wrong, she wastes precious hours fixing it, and that's frustrating.

When Alice wants to run large experiments, she has to use remote servers hosted in the computing centre: a farm of very powerful multi-GPU servers, connected to a petabyte of network-attached storage. Of course, she has to share these servers with other researchers. Every week, the team leads meet and try to prioritize projects and workloads: this is never easy, and decisions often need to be escalated to the lab director.

Let's see how SageMaker and cloud computing can help Alice.

Launching an inexpensive SageMaker notebook instance in minutes, Alice could start running some sample notebooks, and she would quickly become familiar with the service, as it's based on the same tools she already uses. Scaling up, she then could train her own model on a cluster of powerful GPU instances, created on demand with just a couple of lines of code. That's more computing power than she would have ever managed using in the computing centre, and she wouldn't have to set up anything!

Thanks to the automatic model tuning feature in SageMaker, Alice would also be able to significantly improve the accuracy of her models in just a few hours of parallel optimization. Again, doing this with her previous setup would have been impossible due to the lack of computing resources. Deploying models would be equally straightforward: adapting a couple of lines of code found in a sample notebook, Alice would use the batch transform feature to predict her test dataset, again spending no time at all worrying about tools or infrastructure.

Last but not least, keeping track of her expenses would be easy: the AWS console would tell her how much she's spent, which would be less than expected thanks to the on-demand nature of SageMaker infrastructure!

Solving Bob's problems

Bob is a DevOps engineer, and he's in charge of a large training cluster shared by a team of data scientists. They can start their distributed jobs in seconds, and it's just simpler for Bob to manage a single cluster. Auto Scaling is set up, but capacity planning is still needed to find the right amount of EC2 instances and to optimize the cost using the right mix of Reserved, Spot, and On-Demand instances. Bob has a weekly meeting with the team to make sure they'll have enough instances… and they also ping him on Slack when they need extra capacity on the fly. Bob tries to automatically reduce capacity at night and on weekends when the cluster is less busy, but he's quite sure they're spending too much anyway. Oh, well.

Once models have been trained and validated, Bob uses Continuous Integration and Continuous Deployment (CI/CD) to deploy them automatically to the production Docker cluster. Bob maintains bespoke containers for training and prediction: libraries, dependencies, and in-house tools. That takes a bit of time, but he enjoys doing it. He just hopes that no one will ask him to do PyTorch and Apache MXNet too.

Let's see how Bob could use SageMaker to improve his ML workflows.

As SageMaker is based on Docker containers, Bob could get rid of his bespoke containers and use their built-in counterparts. Migrating the training workloads to SageMaker would be pretty easy. This would help Bob get rid of his training cluster, and let every data scientist train completely on demand instead. With Managed Spot Training, Bob could certainly optimize training costs even more.

The data science team would quickly adopt advanced features like distributed training, Pipe mode, and automatic model tuning. This would save them a lot of time, and the team would no longer have to maintain the kludgy code they have written to implement similar features.

Of course, Alice and Bob are fictional characters. Yet, I keep meeting many customers who share some (and sometimes all) of their pain points. That may be your case too, so please let me get you started with Amazon SageMaker.