Book Image

Practical Deep Learning at Scale with MLflow

By : Yong Liu
5 (1)
Book Image

Practical Deep Learning at Scale with MLflow

5 (1)
By: Yong Liu

Overview of this book

The book starts with an overview of the deep learning (DL) life cycle and the emerging Machine Learning Ops (MLOps) field, providing a clear picture of the four pillars of deep learning: data, model, code, and explainability and the role of MLflow in these areas. From there onward, it guides you step by step in understanding the concept of MLflow experiments and usage patterns, using MLflow as a unified framework to track DL data, code and pipelines, models, parameters, and metrics at scale. You’ll also tackle running DL pipelines in a distributed execution environment with reproducibility and provenance tracking, and tuning DL models through hyperparameter optimization (HPO) with Ray Tune, Optuna, and HyperBand. As you progress, you’ll learn how to build a multi-step DL inference pipeline with preprocessing and postprocessing steps, deploy a DL inference pipeline for production using Ray Serve and AWS SageMaker, and finally create a DL explanation as a service (EaaS) using the popular Shapley Additive Explanations (SHAP) toolbox. By the end of this book, you’ll have built the foundation and gained the hands-on experience you need to develop a DL pipeline solution from initial offline experimentation to final deployment and production, all within a reproducible and open source framework.
Table of Contents (17 chapters)
1
Section 1 - Deep Learning Challenges and MLflow Prime
4
Section 2 –
Tracking a Deep Learning Pipeline at Scale
7
Section 3 –
Running Deep Learning Pipelines at Scale
10
Section 4 –
Deploying a Deep Learning Pipeline at Scale
13
Section 5 – Deep Learning Model Explainability at Scale

What this book covers

Chapter 1, Deep Learning Life Cycle and MLOps Challenges, covers the five stages of the full life cycle of DL and the first DL model in this book using the transfer learning approach for text sentiment classification. It also defines the concept of MLOps along with the three foundation layers and four pillars, and the roles of MLflow in these areas. An overview of the challenges in DL data, model, code, and explainability are also presented. This chapter is designed to bring everyone to the same foundational level and provides clarity and guidelines on the scope of the rest of the book.

Chapter 2, Getting Started with MLflow for Deep Learning, serves as an MLflow primer and a first hands-on learning module to quickly set up a local filesystem-based MLflow tracking server or interact with a remote managed MLflow tracking server in Databricks, and perform a first DL experiment using MLflow auto logging. It also explains some foundational MLflow concepts through concrete examples such as experiments, runs, metadata about and the relationship between experiments and runs, code tracking, model logging, and model flavor. Specifically, we underline that experiments should be first-class entities that can be used to bridge the gap between the offline and online production life cycle of DL models. This chapter builds the foundational knowledge of MLflow.

Chapter 3, Tracking Models, Parameters, and Metrics, covers the first in-depth learning module on tracking using a fully-fledged local MLflow tracking server. It starts with setting up a local fully-fledged MLflow tracking server that runs in Docker Desktop, with a MySQL backend store and a MinIO artifact store. Before implementing tracking, this chapter provides an open provenance tracking framework based on the open provenance model vocabulary specification, and presents six types of provenance questions that could be implemented by using MLflow. It then provides hands-on implementation examples on how to use MLflow model-logging APIs and registry APIs to track model provenance, model metrics, and parameters, with or without auto logging. Unlike other typical MLflow API tutorials, which only provide guidance on using the APIs, this chapter instead focuses on how successfully we can use MLflow to answer the provenance questions. By the end of this chapter, we could answer four out of six provenance questions, and the remaining two questions can only be answered when we have a multi-step pipeline or deployment to production, which are covered in the later chapters.

Chapter 4, Tracking Code and Data Versioning, covers the second in-depth learning module on MLflow tracking. It analyzes the current practices on the usage of notebooks and pipelines in the ML/DL projects. It recommends using VS Code notebooks and shows a concrete DL notebook example that can be run either interactively or non-interactively with MLflow tracking enabled. It also recommends using MLflow's MLproject to implement a multi-step DL pipeline using MLflow's entry points and pipeline chaining. A three-step DL pipeline is created for DL model training and registration. In addition, it also shows the pipeline level tracking and individual step tracking through the parent-child nested run in MLflow. Finally, it shows how to track public and privately built Python libraries and data versioning in Delta Lake using MLflow.

Chapter 5, Running DL Pipelines in Different Environments, covers how to run a DL pipeline in different environments. It starts with the scenarios and requirements for executing DL pipelines in different environments. It then shows how to use MLflow's command-line interface (CLI) to submit runs in four scenarios: running locally with local code, running locally with remote code in GitHub, running remotely in the cloud with local code, and running remotely in the cloud with remote code in GitHub. The flexibility and reproducibility supported by MLflow to execute a DL pipeline also provide building blocks for continuous integration/continuous deployment (CI/CD) automation when needed.

Chapter 6, Running Hyperparameter Tuning at Scale, covers using MLflow to support HPO at scale using state-of-the-art HPO frameworks such as Ray Tune. It starts with a review of the types and challenges of DL pipeline hyperparameters. Then, it compares three HPO frameworks Ray Tune, Optuna, and HyperOpt, and provides a detailed analysis of the pros and cons and their integration maturity with MLflow. It then recommends and shows how to use Ray Tune with MLflow to do HPO tuning for the DL model we have been working on in this book so far. Furthermore, it covers how to switch to other HPO search and scheduler algorithms such as Optuna and HyperBand. This enables us to produce high-performance DL models that meet the business requirements in a cost-effective and scalable way.

Chapter 7, Multi-Step Deep Learning Inference Pipeline, covers creating a multi-step inference pipeline using MLflow's custom Python model approach. It starts with an overview of four patterns of inference workflows in production where a single trained model is usually not enough to meet the business application requirements. Additional preprocessing and postprocessing steps are needed. It then presents a step-by-step guide to implementing a multi-step inference pipeline that wraps the previously fine-tuned DL sentiment model with language detection, caching, and additional model metadata. This inference pipeline is then logged as a generic MLflow PyFunc model that can be loaded using the common MLflow PyFunc load API. Having an inference pipeline wrapped as an MLflow model opens doors for automation and consistent management of the model pipeline within the same MLflow framework.

Chapter 8, Deploying a DL Inference Pipeline at Scale, covers deploying a DL inference pipeline into different host environments for production usage. It starts with an overview of the landscape of deployment and hosting environments including batch inference and streaming inference at scale. It then describes the different deployment mechanisms such as MLflow built-in model serving tools, custom deployment plugins, and generic model serving frameworks such as Ray Serve. It shows examples of how to deploy a batch inference pipeline using MLflow's Spark user-defined function (UDF), and how to serve a DL inference pipeline as a local web service using either MLflow's built-in model serving tool or Ray Serve's MLflow deployment plugin, mlflow-ray-serve. It then describes a complete step-by-step guide to deploying a DL inference pipeline to a managed AWS SageMaker instance for production usage.

Chapter 9, Fundamentals of Deep Learning Explainability, covers the foundational concepts of explainability and exploration of using two popular explainability tools. It starts with an overview of the eight dimensions of explainability and explainable AI (XAI), then provides concrete learning examples to explore the usage of SHAP and Transformers-interpret toolboxes for an NLP sentiment pipeline. It emphasizes that explainability should be lifted to be the first-class artifact when developing a DL application since there are increasing demands and expectations for model and data explanation in various business applications and domains.

Chapter 10, Implementing DL Explainability with MLflow, covers how to implement DL explainability using MLflow to provide Explanation-as-a-Service (EaaS). It starts with an overview of MLflow's current capability to support explainers and explanations. Specifically, the existing integration with SHAP in MLflow APIs does not support DL explainability at scale. Therefore, it provides two generic ways of using MLflow's artifact logging APIs and PyFunc APIs for the implementation. Examples are provided for implementing SHAP explanation, which logs the SHAP value in a bar chart in an MLflow tracking server's artifact store. A SHAP explainer can be logged as an MLflow Python model, and then loaded as either a Spark UDF for batch explanation or as a web service for online EaaS. This provides maximal flexibility within a unified MLflow framework for implementing explainability.