# What this book covers

*Chapter 1*, *Causality: Hey, We Have Machine Learning, So Why Even Bother?*, briefly discusses the history of causality and a number of motivating examples. This chapter introduces the notion of spuriousness and demonstrates that some classic definitions of causality do not capture important aspects of causal learning (which human babies know about). This chapter provides the basic distinction between statistical and causal learning, which is a cornerstone for the rest of the book.

*Chapter 2*, *Judea Pearl and the Ladder of Causation*, provides us with a definition of the **Ladder of Causation** – a crucial concept introduced by Judea Pearl that emphasizes the differences between observational, interventional, and counterfactual queries and distributions. We build on top of these ideas and translate them into concrete code examples. Finally, we briefly discuss how different families of machine learning (supervised, reinforcement, semi-, and unsupervised) relate to causal modeling.

*Chapter 3*, *Regression, Observations, and Interventions*, prepares us to take a look at linear regression from a causal perspective. We analyze important properties of observational data and discuss the significance of these properties for causal reasoning. We re-evaluate the problem of statistical control through the causal lens and introduce **structural causal models** (**SCMs**). These topics help us build a strong foundation for the rest of the book.

*Chapter 4*, *Graphical Models*, starts with a refresher on graphs and basic graph theory. After refreshing the fundamental concepts, we use them to define **directed acyclic graphs** (**DAGs**) – one of the most crucial concepts in Pearlian causality. We briefly introduce the sources of causal graphs in the real world and touch upon causal models that are not easily describable using DAGs. This prepares us for *Chapter 5*.

*Chapter 5*, *Forks, Chains, and Immoralities*, focuses on three basic graphical structures: forks, chains, and immoralities (also known as colliders). We learn about the crucial properties of these structures and demonstrate how these graphical concepts manifest themselves in the statistical properties of the data. The knowledge we gain in this chapter will be one of the fundamental building blocks of the concepts and techniques that we introduced in *Part 2* and *Part 3* of this book.

*Chapter 6*, *Nodes, Edges, and Statistical (In)Dependence*, builds on top of the concepts introduced in *Chapter 5* and takes them a step further. We introduce the concept of **d-separation**, which will allow us to systematically evaluate conditional independence queries in DAGs, and define the notion of **estimand**. Finally, we discuss three popular estimands and the conditions under which they can be applied.

*Chapter 7*, *The Four-Step Process of Causal Inference*, takes us to the practical side of causality. We introduce DoWhy – an open source causal inference library created by researchers from Microsoft – and show how to carry out a full causal inference process using its intuitive APIs. We demonstrate how to define a causal model, find a relevant estimand, estimate causal effects, and perform **refutation tests**.

*Chapter 8*, *Causal Models – Assumptions and Challenges*, brings our attention back to the topic of assumptions. Assumptions are a crucial and indispensable part of any causal project or analysis. In this chapter, we take a broader view and discuss the most important assumptions from the point of view of two causal formalisms: the **Pearlian** (graph-based) framework and the **potential ****outcomes** framework.

*Chapter 9*, *Causal Inference and Machine Learning – from Matching to Meta-learners*, opens the door to causal estimation beyond simple linear models. We start by introducing the ideas behind **matching** and **propensity scores** and discussing why propensity scores should not be used for matching. We introduce meta-learners – a class of models that can be used for the estimation of **conditional average treatment effects** (**CATEs**) and implement them using DoWhy and EconML packages.

*Chapter 10*, *Causal Inference and Machine Learning – Advanced Estimators, Experiments, Evaluations, and More*, introduces more advanced estimators: **DR-Learner**, **double machine learning** (**DML**), and **causal forest**. We show how to use CATE estimators with experimental data and introduce a number of useful evaluation metrics that can be applied in real-world scenarios. We conclude the chapter with a brief discussion of counterfactual explanations.

*Chapter 11*, *Causal Inference and Machine Learning – Deep Learning, NLP, and Beyond*, introduces deep learning models for CATE estimation and a PyTorch-based CATENets library. In the second part of the chapter, we take a look at the intersection of causal inference and NLP and introduce **CausalBert** – a Transformer-based model that can be used to remove spurious relationships present in textual data. We close the chapter with an introduction to the **synthetic control estimator**, which we use to estimate causal effects in real-world data.

*Chapter 12*, *Can I Have a Causal Graph, Please?*, provides us with a deeper look at the real-world sources of causal knowledge and introduces us to the concept of automated **causal discovery**. We discuss the idea of expert knowledge and its value in the process of causal analysis.

*Chapter 13*, *Causal Discovery and Machine Learning – from Assumptions to Applications*, starts with a review of assumptions required by some of the popular causal discovery algorithms. We introduce four main families of causal discovery methods and implement key algorithms using the gCastle library, addressing some of the important challenges on the way. Finally, we demonstrate how to encode expert knowledge when working with selected methods.

*Chapter 14*, *Causal Discovery and Machine Learning – Advanced Deep Learning and Beyond*, introduces an advanced causal discovery algorithm – **DECI**. We implement it using the modules coming from an open source Microsoft library, Causica, and train it using PyTorch. We present methods that allow us to work with datasets with hidden confounding and implement one of them – **fast causal inference** (**FCI**) – using the `causal-learn`

library. Finally, we briefly discuss two frameworks that allow us to combine observational and interventional data in order to make causal discovery more efficient and less error-prone.

*Chapter 15*, *Epilogue*, closes *Part 3* of the book with a summary of what we’ve learned, a discussion of causality in business, a sneak peek into the (potential) future of the field, and pointers to more resources on causal inference and discovery for those who are ready to continue their causal journey.