Book Image

Machine Learning on Kubernetes

By : Faisal Masood, Ross Brigoli
Book Image

Machine Learning on Kubernetes

By: Faisal Masood, Ross Brigoli

Overview of this book

MLOps is an emerging field that aims to bring repeatability, automation, and standardization of the software engineering domain to data science and machine learning engineering. By implementing MLOps with Kubernetes, data scientists, IT professionals, and data engineers can collaborate and build machine learning solutions that deliver business value for their organization. You'll begin by understanding the different components of a machine learning project. Then, you'll design and build a practical end-to-end machine learning project using open source software. As you progress, you'll understand the basics of MLOps and the value it can bring to machine learning projects. You will also gain experience in building, configuring, and using an open source, containerized machine learning platform. In later chapters, you will prepare data, build and deploy machine learning models, and automate workflow tasks using the same platform. Finally, the exercises in this book will help you get hands-on experience in Kubernetes and open source tools, such as JupyterHub, MLflow, and Airflow. By the end of this book, you'll have learned how to effectively build, train, and deploy a machine learning model using the machine learning platform you built.
Table of Contents (16 chapters)
1
Part 1: The Challenges of Adopting ML and Understanding MLOps (What and Why)
5
Part 2: The Building Blocks of an MLOps Platform and How to Build One on Kubernetes
10
Part 3: How to Use the MLOps Platform and Build a Full End-to-End Project Using the New Platform

What this book covers

Chapter 1, Challenges in Machine Learning, discusses the challenges organizations face in adopting ML and why a good number of ML initiatives may not deliver the expected outcomes. The chapter further discusses the top few reasons why organizations face these challenges.

Chapter 2, Understanding MLOps, continues building on the identified set of problems from Chapter 1, Challenges in Machine Learning, and discusses how we can tackle the challenges in adopting ML. The chapter will provide the definition of MLOps and how it helps organizations to get value out of their ML initiatives. The chapter also provides a blueprint on how companies can adopt MLOps in their ML projects.

Chapter 3, Exploring Kubernetes, first describes why we have chosen Kubernetes as the basis for MLOps in this book. The chapter further defines the core concept of Kubernetes and assists you in creating an environment where the code can be tested. The world is changing fast and part of this high-velocity disruption is the availability of the cloud and cloud-based solutions. This chapter provides an overview of how the Kubernetes-based platform can give you the flexibility to run your solution anywhere.

Chapter 4, The Anatomy of a Machine Learning Platform, takes a 1,000-foot view of what an ML platform looks like. You already know what problems MLOps solves. This chapter defines the components of an MLOps platform in a technology-agnostic way. You will build a solid foundation on the core components of an MLOps platform.

Chapter 5, Data Engineering, covers an important part of any ML project that is often missed. A good number of ML tutorials/books start with a clean dataset, maybe a CSV file to build your model against. The real world is different. Data comes in many shapes and sizes and it is important that you have a well-defined strategy to harvest, process, and prepare data at scale. This chapter will define the role data engineering plays in a successful ML project. It will discuss OSS tools that can provide the basis for data engineering. The chapter will then talk about how you can install these toolsets on the Kubernetes platform.

Chapter 6, Machine Learning Engineering, will move the discussion to the model building tuning and deployment activities of an ML development life cycle. The chapter will discuss providing a self-service solution to data scientists so they can work more efficiently and collaborate with data engineering teams and fellow data scientists using the same platform. It will also discuss OSS tools that can provide the basis for model development. The chapter will then talk about how you can install these toolsets on the Kubernetes platform.

Chapter 7, Model Deployment and Automation, covers the deployment phase of the ML project life cycle. The model you build knows the data you provided to it. In the real world, however, the data changes. This chapter discusses the tools and techniques to monitor your model performance. This performance data could be used to decide whether the model needs retraining on a new dataset or whether it's time to build a new model for the given problem.

Chapter 8, Building a Complete ML Project Using the Platform, will define a typical ML project and how each component of the platform is utilized in every step of the project life cycle. The chapter will define the outcomes and requirements of the project and focus on how the MLOps platform facilitates the project life cycle.

Chapter 9, Building Your Data Pipeline, will show how a Spark cluster can be used to ingest and process data. The chapter will show how the platform enables the data engineer to read the raw data from any storage, process it, and write it back to another storage. The main focus is to demonstrate how a Spark cluster can be created on-demand and how workloads could be isolated in a shared environment.

Chapter 10, Building, Deploying, and Monitoring Your Model, will show how the JuyterHub server can be used to build, train, and tune models on the platform. The chapter will show how the platform enables the data scientist to perform the modeling activities in a self-serving fashion. This chapter will also introduce MLflow as the model experiment tracking and model registry component. Now you have a working model, how do you want to share this model for the other teams to consume? This chapter will show how the Seldon Core component allows non-programmers to expose their models as REST APIs. You will see how the deployed APIs automatically scale out using the Kubernetes capabilities.

Chapter 11, Machine Learning on Kubernetes, will take you through some of the key ideas to bring forth with you to further your knowledge on the subject. This chapter will cover identifying use cases for the ML platform, operationalizing ML, and running on Kubernetes.