Book Image

Interpretable Machine Learning with Python - Second Edition

By : Serg Masís
4 (4)
Book Image

Interpretable Machine Learning with Python - Second Edition

4 (4)
By: Serg Masís

Overview of this book

Interpretable Machine Learning with Python, Second Edition, brings to light the key concepts of interpreting machine learning models by analyzing real-world data, providing you with a wide range of skills and tools to decipher the results of even the most complex models. Build your interpretability toolkit with several use cases, from flight delay prediction to waste classification to COMPAS risk assessment scores. This book is full of useful techniques, introducing them to the right use case. Learn traditional methods, such as feature importance and partial dependence plots to integrated gradients for NLP interpretations and gradient-based attribution methods, such as saliency maps. In addition to the step-by-step code, you’ll get hands-on with tuning models and training data for interpretability by reducing complexity, mitigating bias, placing guardrails, and enhancing reliability. By the end of the book, you’ll be confident in tackling interpretability challenges with black-box models using tabular, language, image, and time series data.
Table of Contents (17 chapters)
15
Other Books You May Enjoy
16
Index

The mission

Imagine you are an analyst for a national health ministry, and there’s a Cardiovascular Diseases (CVDs) epidemic. The minister has made it a priority to reverse the growth and reduce the caseload to a 20-year low. To this end, a task force has been created to find clues in the data to ascertain the following:

  • What risk factors can be addressed.
  • If future cases can be predicted, interpret predictions on a case-by-case basis.

You are part of this task force!

Details about CVD

Before we dive into the data, we must gather some important details about CVD in order to do the following:

  • Understand the problem’s context and relevance.
  • Extract domain knowledge information that can inform our data analysis and model interpretation.
  • Relate an expert-informed background to a dataset’s features.

CVDs are a group of disorders, the most common of which is coronary heart disease (also known as Ischaemic Heart Disease). According to the World Health Organization, CVD is the leading cause of death globally, killing close to 18 million people annually. Coronary heart disease and strokes (which are, for the most part, a byproduct of CVD) are the most significant contributors to that. It is estimated that 80% of CVD is made up of modifiable risk factors. In other words, some of the preventable factors that cause CVD include the following:

  • Poor diet
  • Smoking and alcohol consumption habits
  • Obesity
  • Lack of physical activity
  • Poor sleep

Also, many of the risk factors are non-modifiable and, therefore, known to be unavoidable, including the following:

  • Genetic predisposition
  • Old age
  • Male (varies with age)

We won’t go into more domain-specific details about CVD because it is not required to make sense of the example. However, it can’t be stressed enough how central domain knowledge is to model interpretation. So, if this example was your job and many lives depended on your analysis, it would be advisable to read the latest scientific research on the subject and consult with domain experts to inform your interpretations.