Book Image

Machine Learning on Kubernetes

By : Faisal Masood, Ross Brigoli

Book Image

Machine Learning on Kubernetes

By: Faisal Masood, Ross Brigoli

Overview of this book

MLOps is an emerging field that aims to bring repeatability, automation, and standardization of the software engineering domain to data science and machine learning engineering. By implementing MLOps with Kubernetes, data scientists, IT professionals, and data engineers can collaborate and build machine learning solutions that deliver business value for their organization. You'll begin by understanding the different components of a machine learning project. Then, you'll design and build a practical end-to-end machine learning project using open source software. As you progress, you'll understand the basics of MLOps and the value it can bring to machine learning projects. You will also gain experience in building, configuring, and using an open source, containerized machine learning platform. In later chapters, you will prepare data, build and deploy machine learning models, and automate workflow tasks using the same platform. Finally, the exercises in this book will help you get hands-on experience in Kubernetes and open source tools, such as JupyterHub, MLflow, and Airflow. By the end of this book, you'll have learned how to effectively build, train, and deploy a machine learning model using the machine learning platform you built.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Share Your Thoughts

Part 1: The Challenges of Adopting ML and Understanding MLOps (What and Why)

Part 1: The Challenges of Adopting ML and Understanding MLOps (What and Why)

Free Chapter

Chapter 1: Challenges in Machine Learning

Chapter 1: Challenges in Machine Learning

Understanding ML

Delivering ML value

Choosing the right approach

Facing the challenges of adopting ML

An overview of the ML platform

Further reading

Chapter 2: Understanding MLOps

Chapter 2: Understanding MLOps

Comparing ML to traditional programming

Exploring the benefits of DevOps

Understanding MLOps

The role of OSS in ML projects

Running ML projects on Kubernetes

Further reading

Chapter 3: Exploring Kubernetes

Chapter 3: Exploring Kubernetes

Technical requirements

Exploring Kubernetes major components

Becoming cloud-agnostic through Kubernetes

Understanding Operators

Setting up your local Kubernetes environment

Provisioning a VM on GCP

Part 2: The Building Blocks of an MLOps Platform and How to Build One on Kubernetes

Part 2: The Building Blocks of an MLOps Platform and How to Build One on Kubernetes

Chapter 4: The Anatomy of a Machine Learning Platform

Chapter 4: The Anatomy of a Machine Learning Platform

Technical requirements

Defining a self-service platform

Exploring the data engineering components

Exploring the model development components

Security, monitoring, and automation

Introducing ODH

Further reading

Chapter 5: Data Engineering

Chapter 5: Data Engineering

Technical requirements

Configuring Keycloak for authentication

Configuring ODH components

Understanding and using JupyterHub

Understanding the basics of Apache Spark

Understanding how ODH provisions Apache Spark cluster on-demand

Writing and running a Spark application from Jupyter Notebook

Chapter 6: Machine Learning Engineering

Chapter 6: Machine Learning Engineering

Technical requirements

Understanding ML engineering

Using a custom notebook image

Introducing MLflow

Using MLFlow as an experiment tracking system

Using MLFlow as a model registry system

Chapter 7: Model Deployment and Automation

Chapter 7: Model Deployment and Automation

Technical requirements

Understanding model inferencing with Seldon Core

Packaging, running, and monitoring a model using Seldon Core

Introducing Apache Airflow

Automating ML model deployments in Airflow

Part 3: How to Use the MLOps Platform and Build a Full End-to-End Project Using the New Platform

Part 3: How to Use the MLOps Platform and Build a Full End-to-End Project Using the New Platform

Chapter 8: Building a Complete ML Project Using the Platform

Chapter 8: Building a Complete ML Project Using the Platform

Reviewing the complete picture of the ML platform

Understanding the business problem

Data collection, processing, and cleaning

Performing exploratory data analysis

Understanding feature engineering

Building and evaluating the ML model

Reproducibility

Chapter 9: Building Your Data Pipeline

Chapter 9: Building Your Data Pipeline

Technical requirements

Automated provisioning of a Spark cluster for development

Writing a Spark data pipeline

Building and executing a data pipeline using Airflow

Chapter 10: Building, Deploying, and Monitoring Your Model

Chapter 10: Building, Deploying, and Monitoring Your Model

Technical requirements

Visualizing and exploring data using JupyterHub

Building and tuning your model using JupyterHub

Tracking model experiments and versioning using MLflow

Deploying the model as a service

Monitoring your model

Chapter 11: Machine Learning on Kubernetes

Chapter 11: Machine Learning on Kubernetes

Identifying ML platform use cases

Operationalizing ML

Running on Kubernetes

Further reading

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Writing a Spark data pipeline

In this section, you will build a real data pipeline for gathering and processing datasets. The objective of the processing is to format, clean, and transform data into a state that is useable for model training. Before writing our data pipeline, let's first understand the data.

Preparing the environment

In order to perform the following exercises, we first need to set up a couple of things. You need to set up a PostgreSQL database to hold the historical flights data. And you need to upload files to an S3 bucket in MinIO. We used both a relational database and an S3 bucket to better demonstrate how to gather data from disparate data sources.

We have prepared a Postgres database container image that you can run on your Kubernetes cluster. The container image is available at https://quay.io/repository/ml-on-k8s/flights-data. It runs a PostgreSQL database with preloaded flights data in a table called flights.

Go through the following steps...