Book Image

Machine Learning with Amazon SageMaker Cookbook

By : Joshua Arvin Lat
Book Image

Machine Learning with Amazon SageMaker Cookbook

By: Joshua Arvin Lat

Overview of this book

Amazon SageMaker is a fully managed machine learning (ML) service that helps data scientists and ML practitioners manage ML experiments. In this book, you'll use the different capabilities and features of Amazon SageMaker to solve relevant data science and ML problems. This step-by-step guide features 80 proven recipes designed to give you the hands-on machine learning experience needed to contribute to real-world experiments and projects. You'll cover the algorithms and techniques that are commonly used when training and deploying NLP, time series forecasting, and computer vision models to solve ML problems. You'll explore various solutions for working with deep learning libraries and frameworks such as TensorFlow, PyTorch, and Hugging Face Transformers in Amazon SageMaker. You'll also learn how to use SageMaker Clarify, SageMaker Model Monitor, SageMaker Debugger, and SageMaker Experiments to debug, manage, and monitor multiple ML experiments and deployments. Moreover, you'll have a better understanding of how SageMaker Feature Store, Autopilot, and Pipelines can meet the specific needs of data science teams. By the end of this book, you'll be able to combine the different solutions you've learned as building blocks to solve real-world ML problems.
Table of Contents (11 chapters)

Querying data from the offline store of SageMaker Feature Store and uploading it to Amazon S3

In the previous recipe, we generated a synthetic dataset and stored it in SageMaker Feature Store using the ingest() function. In this recipe, we will demonstrate how to load the data from the feature group of the offline store where the data was stored in the previous recipe. As we discussed in the Generating a synthetic dataset and using SageMaker Feature Store for storage and management recipe, the offline store is useful for use cases that involve loading a batch of records that are used during the training phase. That said, the training, validation, and test datasets will be loaded from the offline store, exported in CSV format, and then uploaded to S3.

Important note

Note that you may need a wait for a few minutes before the offline store data is available for querying if you've just finished ingesting data into the feature group in the Generating a synthetic dataset and using...