Interpretable Machine Learning with Python - Second Edition

By : Serg Masís

4 (4)

Buy this Book

Interpretable Machine Learning with Python - Second Edition

4 (4)

By: Serg Masís

Buy this Book

Overview of this book

Interpretable Machine Learning with Python, Second Edition, brings to light the key concepts of interpreting machine learning models by analyzing real-world data, providing you with a wide range of skills and tools to decipher the results of even the most complex models. Build your interpretability toolkit with several use cases, from flight delay prediction to waste classification to COMPAS risk assessment scores. This book is full of useful techniques, introducing them to the right use case. Learn traditional methods, such as feature importance and partial dependence plots to integrated gradients for NLP interpretations and gradient-based attribution methods, such as saliency maps. In addition to the step-by-step code, you’ll get hands-on with tuning models and training data for interpretability by reducing complexity, mitigating bias, placing guardrails, and enhancing reliability. By the end of the book, you’ll be confident in tackling interpretability challenges with black-box models using tabular, language, image, and time series data.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Interpretation, Interpretability, and Explainability; and Why Does It All Matter?

Technical requirements

What is machine learning interpretation?

Understanding the difference between interpretability and explainability

A business case for interpretability

Summary

Image sources

Dataset sources

Further reading

Free Chapter

Key Concepts of Interpretability

Technical requirements

The mission

The approach

Preparations

Interpretation method types and scopes

Appreciating what hinders machine learning interpretability

Mission accomplished

Summary

Further reading

Interpretation Challenges

Technical requirements

The mission

The approach

The preparations

Loading the libraries

Reviewing traditional model interpretation methods

Understanding limitations of traditional model interpretation methods

Studying intrinsically interpretable (white-box) models

Recognizing the trade-off between performance and interpretability

Discovering newer interpretable (glass-box) models

Mission accomplished

Summary

Dataset sources

Further reading

Global Model-Agnostic Interpretation Methods

Technical requirements

The mission

The approach

The preparations

Model training and evaluation

What is feature importance?

Assessing feature importance with model-agnostic methods

Visualize global explanations

Feature summary explanations

Feature interactions

Summary

Further reading

Local Model-Agnostic Interpretation Methods

Technical requirements

The mission

The approach

The preparations

Leveraging SHAP’s KernelExplainer for local interpretations with SHAP values

Employing LIME

Using LIME for NLP

Trying SHAP for NLP

Comparing SHAP with LIME

Mission accomplished

Summary

Dataset sources

Further reading

Anchors and Counterfactual Explanations

Technical requirements

The mission

The approach

The preparations

Understanding anchor explanations

Exploring counterfactual explanations

Mission accomplished

Summary

Dataset sources

Further reading

Visualizing Convolutional Neural Networks

Technical requirements

The mission

The approach

Preparations

Visualizing the learning process with activation-based methods

Evaluating misclassifications with gradient-based attribution methods

Understanding classifications with perturbation-based attribution methods

Mission accomplished

Summary

Further reading

Interpreting NLP Transformers

Technical requirements

The mission

The approach

The preparations

Visualizing attention with BertViz

Interpreting token attributions with integrated gradients

LIME, counterfactuals, and other possibilities with the LIT

Mission accomplished

Summary

Further reading

Interpretation Methods for Multivariate Forecasting and Sensitivity Analysis

Technical requirements

The mission

The approach

The preparation

Assessing time series models with traditional interpretation methods

Generating LSTM attributions with integrated gradients

Computing global and local attributions with SHAP’s KernelExplainer

Identifying influential features with factor prioritization

Quantifying uncertainty and cost sensitivity with factor fixing

Mission accomplished

Summary

Dataset and image sources

Further reading

Feature Selection and Engineering for Interpretability

Technical requirements

The mission

The approach

The preparations

Understanding the effect of irrelevant features

Reviewing filter-based feature selection methods

Exploring embedded feature selection methods

Discovering wrapper, hybrid, and advanced feature selection methods

Considering feature engineering

Mission accomplished

Summary

Dataset sources

Further reading

Bias Mitigation and Causal Inference Methods

Technical requirements

Creating a causal model

Understanding heterogeneous treatment effects

Testing estimate robustness

Mission accomplished

Summary

Dataset sources

Further reading

Monotonic Constraints and Model Tuning for Interpretability

Technical requirements

The mission

The approach

The preparations

Placing guardrails with feature engineering

Tuning models for interpretability

Implementing model constraints

Mission accomplished

Summary

Dataset sources

Further reading

Adversarial Robustness

Technical requirements

The mission

The approach

The preparations

Learning about evasion attacks

Defending against targeted attacks with preprocessing

Shielding against any evasion attack by adversarial training of a robust classifier

Evaluating adversarial robustness

Mission accomplished

Summary

Dataset sources

Further reading

What’s Next for Machine Learning Interpretability?

Understanding the current landscape of ML interpretability

Speculating on the future of ML interpretability

Summary

Further reading

Other Books You May Enjoy

Index

Customer Reviews

4 (4)

5 star

50%

4 star

25%

3 star

2 star

25%

1 star

Preparations

We will find the code for this example here: https://github.com/PacktPublishing/Interpretable-Machine-Learning-with-Python-2E/tree/main/02/CVD.ipynb.

Loading the libraries

To run this example, we need to install the following libraries:

mldatasets to load the dataset
pandas and numpy to manipulate it
statsmodels to fit the logistic regression model
sklearn (scikit-learn) to split the data
matplotlib and seaborn to visualize the interpretations

We should load all of them first:

import math
import mldatasets
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns

Understanding and preparing the data

The data to be used in this example should then be loaded into a DataFrame we call cvd_df:

cvd_df = mldatasets.load("cardiovascular-disease")

From this, we should get 70,000 records and 12 columns. We can take a peek at what was loaded with info():

cvd_df.info()

The preceding command will output the names of each column with its type and how many non-null records it contains:

RangeIndex: 70000 entries, 0 to 69999
Data columns (total 12 columns):
age            70000 non-null int64
gender         70000 non-null int64
height         70000 non-null int64
weight         70000 non-null float64
ap_hi          70000 non-null int64
ap_lo          70000 non-null int64
cholesterol    70000 non-null int64
gluc           70000 non-null int64
smoke          70000 non-null int64
alco           70000 non-null int64
active         70000 non-null int64
cardio         70000 non-null int64
dtypes: float64(1), int64(11)

The data dictionary

To understand what was loaded, the following is the data dictionary, as described in the source:

age: Of the patient in days (objective feature)
height: In centimeters (objective feature)
weight: In kg (objective feature)
gender: A binary where 1: female, 2: male (objective feature)
ap_hi: Systolic blood pressure, which is the arterial pressure exerted when blood is ejected during ventricular contraction. Normal value: < 120 mmHg (objective feature)
ap_lo: Diastolic blood pressure, which is the arterial pressure in between heartbeats. Normal value: < 80 mmHg (objective feature)
cholesterol: An ordinal where 1: normal, 2: above normal, and 3: well above normal (objective feature)
gluc: An ordinal where 1: normal, 2: above normal, and 3: well above normal (objective feature)
smoke: A binary where 0: non-smoker and 1: smoker (subjective feature)
alco: A binary where 0: non-drinker and 1: drinker (subjective feature)
active: A binary where 0: non-active and 1: active (subjective feature)
cardio: A binary where 0: no CVD and 1: has CVD (objective and target feature)

It’s essential to understand the data generation process of a dataset, which is why the features are split into two categories:

Objective: A feature that is a product of official documents or a clinical examination. It is expected to have a rather insignificant margin of error due to clerical or machine errors.
Subjective: Reported by the patient and not verified (or unverifiable). In this case, due to lapses of memory, differences in understanding, or dishonesty, it is expected to be less reliable than objective features.

At the end of the day, trusting the model is often about trusting the data used to train it, so how much patients lie about smoking can make a difference.

Data preparation

For the sake of interpretability and model performance, there are several data preparation tasks that we can perform, but the one that stands out right now is age. Age is not something we usually measure in days. In fact, for health-related predictions like this one, we might even want to bucket them into age groups since health differences observed between individual year-of-birth cohorts aren’t as evident as those observed between generational cohorts, especially when cross tabulating with other features like lifestyle differences. For now, we will convert all ages into years:

cvd_df['age'] = cvd_df['age'] / 365.24

The result is a more understandable column because we expect age values to be between 0 and 120. We took existing data and transformed it. This is an example of feature engineering, which is when we use the domain knowledge of our data to create features that better represent our problem, thereby improving our models. We will discuss this further in Chapter 11, Bias Mitigation and Causal Inference Methods. There’s value in performing feature engineering simply to make model outcomes more interpretable as long as this doesn’t significantly hurt model performance. In fact, it might improve predictive performance. Note that there was no loss in data in the feature engineering performed on the age column, as the decimal value for years is maintained.

Now we are going to take a peek at what the summary statistics are for each one of our features using the describe() method:

cvd_df.describe(percentiles=[.01,.99]).transpose()

Figure 2.1 shows the summary statistics outputted by the preceding code. It includes the 1% and 99% percentiles, which tell us what are among the highest and lowest values for each feature:

Figure 2.1: Summary statistics for the dataset

In Figure 2.1, age appears valid because it ranges between 29 and 65 years, which is not out of the ordinary, but there are some anomalous outliers for ap_hi and ap_lo. Blood pressure can’t be negative, and the highest ever recorded was 370. Keeping these outliers in there can lead to poor model performance and interpretability. Given that the 1% and 99% percentiles still show values in normal ranges according to Figure 2.1, there’s close to 2% of records with invalid values. If you dig deeper, you’ll realize it’s closer to 1.8%.

incorrect_l = cvd_df[
    (cvd_df['ap_hi']>370)
    | (cvd_df['ap_hi']<=40)
    | (cvd_df['ap_lo'] > 370)
    | (cvd_df['ap_lo'] <= 40)
].index
print(len(incorrect_l) / cvd_df.shape[0])

There are many ways we could handle these incorrect values, but because they are relatively few records and we lack the domain expertise to guess if they were mistyped (and correct them accordingly), we will delete them:

cvd_df.drop(incorrect_l, inplace=True)

For good measure, we ought to make sure that ap_hi is always higher than ap_lo, so any record with that discrepancy should also be dropped:

cvd_df = cvd_df[cvd_df['ap_hi'] >=\
                cvd_df['ap_lo']].reset_index(drop=True)

Now, in order to fit a logistic regression model, we must put all objective, examination, and subjective features together as X and the target feature alone as y. After this, we split X and y into training and test datasets, but make sure to include random_state for reproducibility:

y = cvd_df['cardio']
X = cvd_df.drop(['cardio'], axis=1).copy()
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.15, random_state=9
)

The scikit-learn train_test_split function puts 15% of the observations in the test dataset and the remainder in the train dataset, so you end up with X and y pairs for both.

Now that we have our data ready for training, let’s train a model and interpret it.

Interpretable Machine Learning with Python - Second Edition

By : Serg Masís

Interpretable Machine Learning with Python - Second Edition

By: Serg Masís

Overview of this book

Related Content you might be interested in

Current Title:

Interpretable Machine Learning with Python - Second Edition

Applied Machine Learning Explainability Techniques

Deep Learning and XAI Techniques for Anomaly Detection

Responsible AI in the Enterprise

Preparations

Loading the libraries

Understanding and preparing the data

The data dictionary

Data preparation