Book Image

Getting Started with Amazon SageMaker Studio

By : Michael Hsieh
Book Image

Getting Started with Amazon SageMaker Studio

By: Michael Hsieh

Overview of this book

Amazon SageMaker Studio is the first integrated development environment (IDE) for machine learning (ML) and is designed to integrate ML workflows: data preparation, feature engineering, statistical bias detection, automated machine learning (AutoML), training, hosting, ML explainability, monitoring, and MLOps in one environment. In this book, you'll start by exploring the features available in Amazon SageMaker Studio to analyze data, develop ML models, and productionize models to meet your goals. As you progress, you will learn how these features work together to address common challenges when building ML models in production. After that, you'll understand how to effectively scale and operationalize the ML life cycle using SageMaker Studio. By the end of this book, you'll have learned ML best practices regarding Amazon SageMaker Studio, as well as being able to improve productivity in the ML development life cycle and build and deploy models easily for your ML use cases.
Table of Contents (16 chapters)
1
Part 1 – Introduction to Machine Learning on Amazon SageMaker Studio
4
Part 2 – End-to-End Machine Learning Life Cycle with SageMaker Studio
11
Part 3 – The Production and Operation of Machine Learning with SageMaker Studio

Detecting bias in ML

For this chapter, I'd like to use an ML adult census income dataset from the University of California Irvine (UCI) ML repository (https://archive.ics.uci.edu/ml/datasets/adult). This dataset contains demographic information from census data and income level as a prediction target. The goal of the dataset is to predict whether a person earns over or below United States dollars (USD) $50,000 ($50K) per year based on the census information. This is a great example and is the type of ML use case that includes socially sensitive categories such as gender and race, and is under the most scrutiny and regulation to ensure fairness when producing an ML model.

In this section, we will analyze the dataset to detect data bias in the training data, mitigate if there is any bias, train an ML model, and analyze whether there is any model bias against a particular group.

Detecting pretraining bias

Please open the notebook in Getting-Started-with-Amazon-SageMaker...