Book Image

Getting Started with Amazon SageMaker Studio

By : Michael Hsieh
Book Image

Getting Started with Amazon SageMaker Studio

By: Michael Hsieh

Overview of this book

Amazon SageMaker Studio is the first integrated development environment (IDE) for machine learning (ML) and is designed to integrate ML workflows: data preparation, feature engineering, statistical bias detection, automated machine learning (AutoML), training, hosting, ML explainability, monitoring, and MLOps in one environment. In this book, you'll start by exploring the features available in Amazon SageMaker Studio to analyze data, develop ML models, and productionize models to meet your goals. As you progress, you will learn how these features work together to address common challenges when building ML models in production. After that, you'll understand how to effectively scale and operationalize the ML life cycle using SageMaker Studio. By the end of this book, you'll have learned ML best practices regarding Amazon SageMaker Studio, as well as being able to improve productivity in the ML development life cycle and build and deploy models easily for your ML use cases.
Table of Contents (16 chapters)
1
Part 1 – Introduction to Machine Learning on Amazon SageMaker Studio
4
Part 2 – End-to-End Machine Learning Life Cycle with SageMaker Studio
11
Part 3 – The Production and Operation of Machine Learning with SageMaker Studio

Exporting data for ML training

SageMaker Data Wrangler supports the following export options: Save to S3, Pipeline, Python Code, and Feature Store. The data transformations we have applied so far are not really applied to the data yet. The transformation steps need to be executed to get the final transformed data. When we export our flow file with the preceding options, SageMaker Data Wrangler automatically generates code and notebooks to guide you through the execution process so that we do not have to write any code, but it leaves flexibility for us to customize the code.

The four export options satisfy many use cases. Save to S3 is an obvious one and offers lots of flexibility. If you would like to get the transformed data in an S3 bucket so that you can train an ML model in Amazon SageMaker, you can also download it locally from S3 and import it to other tools if you need to. The Pipeline option creates a SageMaker pipeline that can easily be called a repeatable workflow. Such...