Book Image

MLOps with Red Hat OpenShift

By : Ross Brigoli, Faisal Masood
Book Image

MLOps with Red Hat OpenShift

By: Ross Brigoli, Faisal Masood

Overview of this book

MLOps with OpenShift offers practical insights for implementing MLOps workflows on the dynamic OpenShift platform. As organizations worldwide seek to harness the power of machine learning operations, this book lays the foundation for your MLOps success. Starting with an exploration of key MLOps concepts, including data preparation, model training, and deployment, you’ll prepare to unleash OpenShift capabilities, kicking off with a primer on containers, pods, operators, and more. With the groundwork in place, you’ll be guided to MLOps workflows, uncovering the applications of popular machine learning frameworks for training and testing models on the platform. As you advance through the chapters, you’ll focus on the open-source data science and machine learning platform, Red Hat OpenShift Data Science, and its partner components, such as Pachyderm and Intel OpenVino, to understand their role in building and managing data pipelines, as well as deploying and monitoring machine learning models. Armed with this comprehensive knowledge, you’ll be able to implement MLOps workflows on the OpenShift platform proficiently.
Table of Contents (13 chapters)
Free Chapter
1
Part 1: Introduction
3
Part 2: Provisioning and Configuration
6
Part 3: Operating ML Workloads

Versioning your data with Pachyderm

Data is the fundamental component for building your models. Without a retrievable version of the dataset the model was trained on, you cannot replicate the model training activity you did in the past and expect the same results. Data versioning enables dataset comparisons and prevents confusion that may occur due to data changes. This allows us to build a reproducible model training workflow. To learn more about Pachyderm in depth, refer to the Pachyderm documentation at https://docs.pachyderm.com/.

To work with Pachyderm, you can either use the Pachyderm command-line tool, pachctl, or the Pachyderm Python library, which we will use in this book.

Before we start, let’s create a new bucket in your MinIO server. We will use this to store the datasets. Let’s call this bucket raw-data. Then, upload the wine.csv file available in the Git repository of this book into this bucket. For the purpose of this exercise, set the raw-data bucket...