Book Image

Getting Started with Amazon SageMaker Studio

By : Michael Hsieh
Book Image

Getting Started with Amazon SageMaker Studio

By: Michael Hsieh

Overview of this book

Amazon SageMaker Studio is the first integrated development environment (IDE) for machine learning (ML) and is designed to integrate ML workflows: data preparation, feature engineering, statistical bias detection, automated machine learning (AutoML), training, hosting, ML explainability, monitoring, and MLOps in one environment. In this book, you'll start by exploring the features available in Amazon SageMaker Studio to analyze data, develop ML models, and productionize models to meet your goals. As you progress, you will learn how these features work together to address common challenges when building ML models in production. After that, you'll understand how to effectively scale and operationalize the ML life cycle using SageMaker Studio. By the end of this book, you'll have learned ML best practices regarding Amazon SageMaker Studio, as well as being able to improve productivity in the ML development life cycle and build and deploy models easily for your ML use cases.
Table of Contents (16 chapters)
1
Part 1 – Introduction to Machine Learning on Amazon SageMaker Studio
4
Part 2 – End-to-End Machine Learning Life Cycle with SageMaker Studio
11
Part 3 – The Production and Operation of Machine Learning with SageMaker Studio

Understanding the concept of a feature store

Consider the following scenario: you are a data scientist working on an ML project in the automotive industry with a fellow data scientist and a few data engineers. You are responsible for modeling vehicle fuel efficiency, while your fellow data scientist is responsible for modeling vehicle performance. Both of you are using data coming from car manufacturers that your company is working with that is preprocessed and stored in the cloud by the data engineers in the team as input to the models.

The data is stored in disparate sources, such as Amazon S3, Amazon Relational Database Service (RDS), and a data lake built on AWS, depending on the nature of the source data. You and your fellow data scientist have been reaching out separately to the data engineering team to get the data processed in certain ways that work best for your respective modeling exercises. You do not realize that your fellow data scientist's models actually share...