Practical Deep Learning at Scale with MLflow

By : Yong Liu

5 (1)

Buy this Book

Practical Deep Learning at Scale with MLflow

5 (1)

By: Yong Liu

Buy this Book

Overview of this book

The book starts with an overview of the deep learning (DL) life cycle and the emerging Machine Learning Ops (MLOps) field, providing a clear picture of the four pillars of deep learning: data, model, code, and explainability and the role of MLflow in these areas. From there onward, it guides you step by step in understanding the concept of MLflow experiments and usage patterns, using MLflow as a unified framework to track DL data, code and pipelines, models, parameters, and metrics at scale. You’ll also tackle running DL pipelines in a distributed execution environment with reproducibility and provenance tracking, and tuning DL models through hyperparameter optimization (HPO) with Ray Tune, Optuna, and HyperBand. As you progress, you’ll learn how to build a multi-step DL inference pipeline with preprocessing and postprocessing steps, deploy a DL inference pipeline for production using Ray Serve and AWS SageMaker, and finally create a DL explanation as a service (EaaS) using the popular Shapley Additive Explanations (SHAP) toolbox. By the end of this book, you’ll have built the foundation and gained the hands-on experience you need to develop a DL pipeline solution from initial offline experimentation to final deployment and production, all within a reproducible and open source framework.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1 - Deep Learning Challenges and MLflow Prime

Free Chapter

Chapter 1: Deep Learning Life Cycle and MLOps Challenges

Technical requirements

Understanding the DL life cycle and MLOps challenges

Understanding DL data challenges

Understanding DL model challenges

Understanding DL code challenges

Understanding DL explainability challenges

Summary

Further reading

Chapter 2: Getting Started with MLflow for Deep Learning

Technical requirements

Setting up MLflow

Implementing our first DL experiment with MLflow autologging

Exploring MLflow's components and usage patterns

Summary

Further reading

Section 2 – Tracking a Deep Learning Pipeline at Scale

Chapter 3: Tracking Models, Parameters, and Metrics

Technical requirements

Setting up a full-fledged local MLflow tracking server

Tracking model provenance

Tracking model metrics

Tracking model parameters

Summary

Further reading

Chapter 4: Tracking Code and Data Versioning

Technical requirements

Tracking notebook and pipeline versioning

Tracking locally, privately built Python libraries

Tracking data versioning in Delta Lake

Summary

Further reading

Section 3 – Running Deep Learning Pipelines at Scale

Chapter 5: Running DL Pipelines in Different Environments

Technical requirements

An overview of different execution scenarios and environments

Running locally with local code

Running remote code in GitHub locally

Running local code remotely in the cloud

Running remotely in the cloud with remote code in GitHub

Summary

Further reading

Chapter 6: Running Hyperparameter Tuning at Scale

Technical requirements

Understanding automatic HPO for DL pipelines

Creating HPO-ready DL models with Ray Tune and MLflow

Running the first Ray Tune HPO experiment with MLflow

Running HPO with Ray Tune using Optuna and HyperBand

Summary

Further reading

Section 4 – Deploying a Deep Learning Pipeline at Scale

Chapter 7: Multi-Step Deep Learning Inference Pipeline

Technical requirements

Understanding patterns of DL inference pipelines

Implementing a custom MLflow Python model

Implementing preprocessing and postprocessing steps in a DL inference pipeline

Implementing an inference pipeline as a new entry point in the main MLproject

Summary

Further reading

Chapter 8: Deploying a DL Inference Pipeline at Scale

Technical requirements

Understanding different deployment tools and host environments

Deploying locally for batch and web service inference

Deploying using Ray Serve and MLflow deployment plugins

Deploying to AWS SageMaker – a complete end-to-end guide

Summary

Further reading

Section 5 – Deep Learning Model Explainability at Scale

Chapter 9: Fundamentals of Deep Learning Explainability

Technical requirements

Understanding the categories and audience of explainability

Exploring the SHAP Explainability toolbox

Exploring the Transformers Interpret toolbox

Summary

Further reading

Chapter 10: Implementing DL Explainability with MLflow

Technical requirements

Understanding current MLflow explainability integration

Implementing a SHAP explanation using the MLflow artifact logging API

Implementing a SHAP explainer using the MLflow pyfunc API

Summary

Further reading

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 (1)

5 star

100%

4 star

3 star

2 star

1 star

Understanding DL explainability challenges

In this section, we will discuss DL explainability challenges at each of the stages described in Figure 1.3. It is increasingly important to view explainability as an integral and necessary mechanism to define, test, debug, validate, and monitor models across the entire model life cycle. Embedding explainability early will make subsequent model validation and operations easier. Also, to maintain ongoing trust in ML/DL models, it is critical to be able to explain and debug ML/DL models after they go live in production:

Data collection/cleaning/annotation: As we have gathered, explainability is critical for model prediction. The root cause of any model's trustworthiness or bias can be traced back to the data used to train the model. Explainability for the data is still an emerging area but is critical. So, what could go wrong and become a challenge during the data collection/cleaning/annotation stage? For example, let's suppose we have an ML/DL model, and its prediction outcome is about whether a loan applicant will pay back a loan or not. If the data collected has certain correlations between age and the loan payback outcome, this will cause the model to use age as a predictor. However, a loan decision based on a person's age is against the law and not allowed even if the model works well. So, during data collection, it could be that the sampling strategy is not sufficient to represent certain subpopulations such as different loan applicants in different age groups.

A subpopulation could have lots of missing fields and then be dropped during data cleaning. This could result in underrepresentation following the data cleaning process. Human annotations could favor the privileged group and other possible unconscious biases. A metric called Disparate Impact could reveal the hidden biases in the data, which compares the proportion of individuals that receive a positive outcome for two groups: an unprivileged group and a privileged group. If the unprivileged group (for example, persons with age > 60) receives a positive outcome (for example, loan approval) less than 80% of the proportion of the privileged group (persons with age < 60), this is a disparate impact violation based on the current common industry standard (a four-fifths rule). Tools such as Dataiku could help to automate the disparate impact and subpopulation analysis to find groups of people who may be treated unfairly or differently because of the data used for model training.

Model development: Model explainability during offline experimentation is very important to not only help understand why a model behaves a certain way but also help with model selection to decide which model to use if we need to put it into production. Accuracy might not be the only criteria to select a winning model. There are a few DL explainability tools, such as SHAP (please refer to Figure 1.5). MLflow integration with SHAP provides a way to implement DL explainability:

Figure 1.5 – NLP text SHAP Variable Importance Plot when using a DL model

Figure 1.5 shows that this NLP model's prediction results' number one feature is the word impressive, followed by rent. Essentially, this breaks the black box of the DL model, giving much confidence to the usage of DL models in production.

Model deployment and serving in production: During the production stage, if the explainability of the model prediction can be readily provided to users, then not only will the usability (user-friendliness) of the model be improved, but also, we can collect better feedback data as users are more incentivized to give more meaningful feedback. A good explainability solution should provide point-level decisions for any prediction outcome. This means that we should be able to answer why a particular person's loan is rejected and how this rejection compares to other people in a similar or different age group. So, the challenge is to have explainability as one of the gated deployment criteria for releasing a new version of the model. However, unlike accuracy metrics, it is very difficult to measure explainability as scores or thresholds, although certain case-based reasoning could be applied and automated. For example, if we have certain hold-out test cases where we expect the same or similar explanations regardless of the versions of the model, then we could use that as a gated release criterion.
Model validation and A/B testing: During online experimentation and ongoing production model validation, we would need explainability to understand whether the model has been applied to the right data or whether the prediction is trustworthy. Usually, ML/DL models encode complex and non-linear relationships. During this stage, it is often desirable to understand how the model influences the metrics of user behavior (for example, a higher conversion rate on a shopping website). Influence sensitivity analysis could provide insights regarding whether a certain user feature such as a user's income has a positive or negative impact on the outcome. If during this stage, we found, for some reason, that higher incomes cause a negative loan approval rate or a lower conversion rate, then this should be automatically flagged. However, automated sensitivity analysis during model validation and A/B testing is still not widely available and remains a challenging problem. A few vendors such as TruEra provide potential solutions to this space.
Monitoring and feedback loops: While model performance metrics and data characteristics are of importance here, explainability can provide an incentive for users to provide valuable feedback and user behavior metrics to identify drivers and causes of model degradation if there are any. As we know, ML/DL models are prone to overfitting and cannot generalize well beyond their training data. One important explainability solution during model production monitoring is to measure how feature importance shifts across different data splits (for example, pre-COVID versus post-COVID). This can help data scientists to identify where degradation in model performance is due to changing data (such as a statistical distribution shift) or changing relationships between variables (such as a concept shift). A recent example provided by TruEra (https://truera.com/machine-learning-explainability-is-just-the-beginning/) illustrates that a loan model changes its prediction behavior due to changes in people's annual income and loan purposes before and after the COVID periods. This explainability of Feature Importance Shift greatly helps to identify the root causes of changes in model behavior during the model production monitoring stage.

In summary, DL explainability is a major challenge where ongoing research is still needed. However, MLflow's integration with SHAP now provides a ready-to-use tool for practical DL applications, which we will cover in our advanced chapter later in this book.

Practical Deep Learning at Scale with MLflow

By : Yong Liu

Practical Deep Learning at Scale with MLflow

By: Yong Liu

Overview of this book

Related Content you might be interested in

Current Title:

Practical Deep Learning at Scale with MLflow

Machine Learning Engineering with MLflow.

Practical Machine Learning on Databricks

Distributed Data Systems with Azure Databricks

Understanding DL explainability challenges