Book Image

Learn Amazon SageMaker - Second Edition

By : Julien Simon
Book Image

Learn Amazon SageMaker - Second Edition

By: Julien Simon

Overview of this book

Amazon SageMaker enables you to quickly build, train, and deploy machine learning models at scale without managing any infrastructure. It helps you focus on the machine learning problem at hand and deploy high-quality models by eliminating the heavy lifting typically involved in each step of the ML process. This second edition will help data scientists and ML developers to explore new features such as SageMaker Data Wrangler, Pipelines, Clarify, Feature Store, and much more. You'll start by learning how to use various capabilities of SageMaker as a single toolset to solve ML challenges and progress to cover features such as AutoML, built-in algorithms and frameworks, and writing your own code and algorithms to build ML models. The book will then show you how to integrate Amazon SageMaker with popular deep learning libraries, such as TensorFlow and PyTorch, to extend the capabilities of existing models. You'll also see how automating your workflows can help you get to production faster with minimum effort and at a lower cost. Finally, you'll explore SageMaker Debugger and SageMaker Model Monitor to detect quality issues in training and production. By the end of this Amazon book, you'll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.
Table of Contents (19 chapters)
1
Section 1: Introduction to Amazon SageMaker
4
Section 2: Building and Training Models
11
Section 3: Diving Deeper into Training
14
Section 4: Managing Models in Production

Setting up Amazon SageMaker Studio

Experimentation is a key part of the Machine learning process. Developers and data scientists use a collection of open source tools and libraries for data exploration, data processing, and, of course, to evaluate candidate algorithms. Installing and maintaining these tools takes a fair amount of time, which would probably be better spent on studying the Machine learning problem itself!

Amazon SageMaker Studio brings you the machine learning tools you need from experimentation to production. At its core is an integrated development environment based on Jupyter that makes it instantly familiar.

In addition, SageMaker Studio is integrated with other SageMaker capabilities, such as SageMaker Experiments to track and compare all jobs, SageMaker Autopilot to automatically create machine learning models, and more. A lot of operations can be achieved in just a few clicks, without having to write any code.

SageMaker Studio also further simplifies infrastructure management. You won't have to create notebook instances: SageMaker Studio provides you with compute environments that are readily available to run your notebooks.

Note

This section requires basic knowledge of Amazon S3, Amazon VPC, and Amazon IAM. If you're not familiar with them at all, please read the following documentation:

https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.htmachine learning

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.htmachine learning

https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.htmachine learning

Now would also probably be a good time to take a look at (and bookmark) the SageMaker pricing page: https://aws.amazon.com/sagemaker/pricing/.

Onboarding to Amazon SageMaker Studio

You can access SageMaker Studio using any of these three options:

  • Use the quick start procedure: This is the easiest option for individual accounts, and we'll walk through it in the following paragraphs.
  • Use AWS Single Sign-On (SSO): If your company has an SSO application set up, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-sso-users.htmachine learning. Please contact your IT administrator for details.
  • Use Amazon IAM: If your company doesn't use SSO, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-iam.htmachine learning. Again, please contact your IT administrator for details.

Onboarding with the quick start procedure

There are several steps to the quick start procedure:

  1. First, open the AWS Console in one of the regions where Amazon SageMaker Studio is available, for example, https://us-east-2.console.aws.amazon.com/sagemaker/.
  2. As shown in the following screenshot, the left-hand vertical panel has a link to SageMaker Studio:
    Figure 1.5 – Opening SageMaker Studio

    Figure 1.5 – Opening SageMaker Studio

  3. Clicking on this link opens the onboarding screen, and you can see its first section in the next screenshot:
    Figure 1.6 – Running Quick start

    Figure 1.6 – Running Quick start

  4. Let's select Quick start. Then, we enter the username we'd like to use to log in to SageMaker Studio, and we create a new IAM role as shown in the preceding screenshot. This opens the following screen:
    Figure 1.7 – Creating an IAM role

    Figure 1.7 – Creating an IAM role

    The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.

  5. Once we've clicked on Create role, we're back to the previous screen. Please make sure that project templates and JumpStart are enabled for this account. (this should be the default setting).
  6. We just have to click on Submit to launch the onboarding procedure. Depending on your account setup, you may get an extra screen asking you to select a VPC and a subnet. I'd recommend selecting any subnet in your default VPC.
  7. A few minutes later, SageMaker Studio is in service, as shown in the following screenshot. We could add extra users if we needed to, but for now, let's just click on Open Studio:
    Figure 1.8 – Launching SageMaker Studio

    Figure 1.8 – Launching SageMaker Studio

    Don't worry if this takes a few more minutes, as SageMaker Studio needs to complete the first-run setup of your environment. As shown in the following screenshot, once we open SageMaker Studio, we see the familiar JupyterLab layout:

    Note

    SageMaker Studio is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available there.

    Figure 1.9 – SageMaker Studio welcome screen

    Figure 1.9 – SageMaker Studio welcome screen

  8. We can immediately create our first notebook. In the Launcher tab, in the Notebooks and compute resources section, let's select Data Science, and click on NotebookPython 3.
  9. This opens a notebook, as is visible in the following screenshot. We first check that SDKs are readily available. As this is the first time we are launching the Data Science kernel, we need to wait for a couple of minutes.

    Figure 1.10 – Checking the SDK version

    Figure 1.10 – Checking the SDK version

  10. As is visible in the following screenshot, we can easily list resources that are currently running in our Studio instance: an machine learning.t3.medium instance, the data science image supporting the kernel used in our notebook, and the notebook itself:
    Figure 1.11 – Viewing Studio resources

    Figure 1.11 – Viewing Studio resources

  11. To avoid unnecessary costs, we should shut these resources down when we're done working with them. For example, we can shut down the instance and all resources running on it, as you can see in the following screenshot. Don't do it now, we'll need the instance to run the next examples!
    Figure 1.12 – Shutting down an instance

    Figure 1.12 – Shutting down an instance

  12. Machine learning.t3.medium is the default instance size that Studio uses. You can switch to other instance types by clicking on 2 vCPU + 4 GiB at the top of your notebook. This lets you select a new instance size and launch it in Studio. After a few minutes, the instance is up and your notebook code has been migrated automatically. Don't forget to shut down the previous instance, as explained earlier.
  13. When we're done working with SageMaker Studio, all we have to do is close the browser tab. If we want to resume working, we just have to go back to the SageMaker console and click on Open Studio.
  14. If we wanted to shut down the Studio instance itself, we'd simply select Shut Down in the File menu. All files would still be preserved until we deleted Studio completely in the SageMaker console.

Now that we've completed the setup, I'm sure you're impatient to get started with machine learning. Let's start deploying some models!