Book Image

Getting Started with Amazon SageMaker Studio

By : Michael Hsieh
Book Image

Getting Started with Amazon SageMaker Studio

By: Michael Hsieh

Overview of this book

Amazon SageMaker Studio is the first integrated development environment (IDE) for machine learning (ML) and is designed to integrate ML workflows: data preparation, feature engineering, statistical bias detection, automated machine learning (AutoML), training, hosting, ML explainability, monitoring, and MLOps in one environment. In this book, you'll start by exploring the features available in Amazon SageMaker Studio to analyze data, develop ML models, and productionize models to meet your goals. As you progress, you will learn how these features work together to address common challenges when building ML models in production. After that, you'll understand how to effectively scale and operationalize the ML life cycle using SageMaker Studio. By the end of this book, you'll have learned ML best practices regarding Amazon SageMaker Studio, as well as being able to improve productivity in the ML development life cycle and build and deploy models easily for your ML use cases.
Table of Contents (16 chapters)
1
Part 1 – Introduction to Machine Learning on Amazon SageMaker Studio
4
Part 2 – End-to-End Machine Learning Life Cycle with SageMaker Studio
11
Part 3 – The Production and Operation of Machine Learning with SageMaker Studio

Importing data from sources

The first step in the data preparation journey is to import data from a source(s). There are four options from which data can be imported: Amazon S3, Amazon Athena, Amazon Redshift, and Snowflake. Amazon S3 is an object store service for developers to store virtually any kind of data, including text files, spreadsheets, archives, and ML models. Amazon Athena is an analytic service that gives developers an interactive and serverless SQL-based query experience for data stored in Amazon S3. Amazon Redshift is a data warehouse service that makes it easy to query and process exabytes of data. Snowflake is a data warehouse service from Snowflake Inc. In this chapter, we will be importing data from Amazon S3 and Amazon Athena, which are the two most common data sources. We have two tables in CSV format saved in the SageMaker default S3 bucket and a table available in Amazon Athena as we did in the chapter03/1-prepare_data.ipynb notebook.

Importing from S3

...