Book Image

Feature Store for Machine Learning

By : Jayanth Kumar M J
Book Image

Feature Store for Machine Learning

By: Jayanth Kumar M J

Overview of this book

Feature store is one of the storage layers in machine learning (ML) operations, where data scientists and ML engineers can store transformed and curated features for ML models. This makes them available for model training, inference (batch and online), and reuse in other ML pipelines. Knowing how to utilize feature stores to their fullest potential can save you a lot of time and effort, and this book will teach you everything you need to know to get started. Feature Store for Machine Learning is for data scientists who want to learn how to use feature stores to share and reuse each other's work and expertise. You’ll be able to implement practices that help in eliminating reprocessing of data, providing model-reproducible capabilities, and reducing duplication of work, thus improving the time to production of the ML model. While this ML book offers some theoretical groundwork for developers who are just getting to grips with feature stores, there's plenty of practical know-how for those ready to put their knowledge to work. With a hands-on approach to implementation and associated methodologies, you'll get up and running in no time. By the end of this book, you’ll have understood why feature stores are essential and how to use them in your ML projects, both on your local system and on the cloud.
Table of Contents (13 chapters)
1
Section 1 – Why Do We Need a Feature Store?
4
Section 2 – A Feature Store in Action
9
Section 3 – Alternatives, Best Practices, and a Use Case

What this book covers

Chapter 1, An Overview of the Machine Learning Life Cycle, starts with a small introduction to ML and then dives deep into an ML use case – a customer lifetime value model. The chapter runs through the different stages of ML development, and finally, it discusses the most time-consuming parts of ML and also what an ideal world and the real world look like in ML development.

Chapter 2, What Problems Do Feature Stores Solve?, introduces us to the main focus of the book, which is feature management and feature stores. It discusses the importance of features in production systems, different ways to bring features into production, and common issues with these approaches, followed by how a feature store can overcome these common issues.

Chapter 3, Feature Store Fundamentals, Terminology, and Usage, starts with an introduction to an open source feature store – Feast – followed by installation, different terminology used in the feature store world, and basic API usage. Finally, it briefly introduces different components that work together in Feast.

Chapter 4, Adding Feature Store to ML Models, will help readers install Feast on AWS as it goes through the different resource creations, such as S3 buckets, a Redshift cluster, and the Glue catalog, step by step with screenshots. Finally, it revisits the feature engineering aspect of the customer lifetime value model developed in Chapter 1, An Overview of the Machine Learning Life Cycle, and creates and ingests the curated features into Feast.

Chapter 5, Model Training and Inference, continues from where we left in Chapter 4, Adding Feature Store to ML Models, and discusses how a feature store can help data scientists and ML engineers collaborate in the development of an ML model. It discusses how to use Feast for batch model inference and also how to build a REST API for online model inference.

Chapter 6, Model to Production and Beyond, discusses the creation of an orchestration environment using Amazon Managed Workflows for Apache Airflow (MWAA), uses the feature engineering, model training, and inference code/notebooks built in the previous chapters, and deploys the batch and online model pipelines into production. Finally, it discusses aspects beyond production, such as feature monitoring, changes to feature definitions, and also building the next ML model.

Chapter 7, Feast Alternatives and ML Best Practices, introduces other feature stores, such as Tecton, Databricks Feature Store, Google Cloud's Vertex AI, Hopsworks Feature Store, and Amazon SageMaker Feature Store. It also introduces the basic usage of the latter so that users can get the gist of what is it like to use a managed feature store. Finally, it briefly discusses the ML best practices.

Chapter 8, Use Case – Customer Churn Prediction, uses a managed feature store offering of Amazon SageMaker and runs through an end-to-end use case to predict customer churn on a telecom dataset. It also covers examples of feature drift monitoring and model performance monitoring.