Book Image

Causal Inference and Discovery in Python

By : Aleksander Molak
4.7 (9)
Book Image

Causal Inference and Discovery in Python

4.7 (9)
By: Aleksander Molak

Overview of this book

Causal methods present unique challenges compared to traditional machine learning and statistics. Learning causality can be challenging, but it offers distinct advantages that elude a purely statistical mindset. Causal Inference and Discovery in Python helps you unlock the potential of causality. You’ll start with basic motivations behind causal thinking and a comprehensive introduction to Pearlian causal concepts, such as structural causal models, interventions, counterfactuals, and more. Each concept is accompanied by a theoretical explanation and a set of practical exercises with Python code. Next, you’ll dive into the world of causal effect estimation, consistently progressing towards modern machine learning methods. Step-by-step, you’ll discover Python causal ecosystem and harness the power of cutting-edge algorithms. You’ll further explore the mechanics of how “causes leave traces” and compare the main families of causal discovery algorithms. The final chapter gives you a broad outlook into the future of causal AI where we examine challenges and opportunities and provide you with a comprehensive list of resources to learn more. By the end of this book, you will be able to build your own models for causal inference and discovery using statistical and machine learning techniques as well as perform basic project assessment.
Table of Contents (21 chapters)
Part 1: Causality – an Introduction
Part 2: Causal Inference
Part 3: Causal Discovery

Inverse probability weighting (IPW)

In this section, we’ll discuss IPW. We’ll see how IPW can be used to de-bias our causal estimates, and we’ll implement it using DoWhy.

Many faces of propensity scores

Although propensity scores might not be the best choice for matching, they still might be useful in other contexts. IPW is a method that allows us to control for confounding by creating so-called pseudo-populations within our data. Pseudo-populations are created by upweighting the underrepresented and downweighting the overrepresented groups in our dataset.

Imagine that we want to estimate the effect of drug D. If males and females react differently to D and we have 2 males and 6 females in the treatment group and 12 males and 2 females in the control group, we might end up with a situation similar to the one that we’ve seen in Chapter 1: the drug is good for everyone, but is harmful to females and males!

This is Simpson’s paradox at its...