Productionizing the batch model pipeline
In Chapter 4, Adding Feature Store to ML Models, for model training, we used the features ingested by the feature engineering notebook. We also created a model-scoring notebook that fetches features for a set of customers from Feast and runs predictions for it using the trained model. For the sake of the experiment, let's assume that the raw data freshness latency is a day. That means the features need to be regenerated once a day, and the model needs to score customers against those features once a day and store the results in an S3 bucket for consumption. To achieve this, thanks to our early organization and decoupling of stages, all we need to do is run the feature engineering and model scoring notebook/Python script once a day consecutively. Now that we also have a tool to perform this, let's go ahead and schedule this workflow in the Airflow environment.
The following figure displays how we will be operationalizing the batch...