Bayesian Analysis with Python - Third Edition

By : Osvaldo Martin

Bayesian Analysis with Python - Third Edition

By: Osvaldo Martin

Overview of this book

The third edition of Bayesian Analysis with Python serves as an introduction to the main concepts of applied Bayesian modeling using PyMC, a state-of-the-art probabilistic programming library, and other libraries that support and facilitate modeling like ArviZ, for exploratory analysis of Bayesian models; Bambi, for flexible and easy hierarchical linear modeling; PreliZ, for prior elicitation; PyMC-BART, for flexible non-parametric regression; and Kulprit, for variable selection. In this updated edition, a brief and conceptual introduction to probability theory enhances your learning journey by introducing new topics like Bayesian additive regression trees (BART), featuring updated examples. Refined explanations, informed by feedback and experience from previous editions, underscore the book's emphasis on Bayesian statistics. You will explore various models, including hierarchical models, generalized linear models for regression and classification, mixture models, Gaussian processes, and BART, using synthetic and real datasets. By the end of this book, you will possess a functional understanding of probabilistic modeling, enabling you to design and implement Bayesian models for your data science challenges. You'll be well-prepared to delve into more advanced material or specialized statistical modeling if the need arises.

Preface

Free Chapter

Chapter 1 Thinking Probabilistically

1.1 Statistics, models, and this book’s approach

1.2 Working with data

1.3 Bayesian modeling

1.4 A probability primer for Bayesian practitioners

1.5 Interpreting probabilities

1.6 Probabilities, uncertainty, and logic

1.7 Single-parameter inference

1.8 How to choose priors

1.9 Communicating a Bayesian analysis

1.10 Summary

1.11 Exercises

Join our community Discord space

Chapter 2 Programming Probabilistically

2.1 Probabilistic programming

2.2 Summarizing the posterior

2.3 Posterior-based decisions

2.4 Gaussians all the way down

2.5 Posterior predictive checks

2.6 Robust inferences

2.7 InferenceData

2.8 Groups comparison

2.9 Summary

2.10 Exercises

Join our community Discord space

Chapter 3 Hierarchical Models

3.1 Sharing information, sharing priors

3.2 Hierarchical shifts

3.3 Water quality

3.4 Shrinkage

3.5 Hierarchies all the way up

3.6 Summary

3.7 Exercises

Join our community Discord space

Chapter 4 Modeling with Lines

4.1 Simple linear regression

4.2 Linear bikes

4.3 Generalizing the linear model

4.4 Counting bikes

4.5 Robust regression

4.6 Logistic regression

4.7 Variable variance

4.8 Hierarchical linear regression

4.9 Multiple linear regression

4.10 Summary

4.11 Exercises

Join our community Discord space

Chapter 5 Comparing Models

5.1 Posterior predictive checks

5.2 The balance between simplicity and accuracy

5.3 Measures of predictive accuracy

5.4 Calculating predictive accuracy with ArviZ

5.5 Model averaging

5.6 Bayes factors

5.7 Bayes factors and inference

5.8 Regularizing priors

5.9 Summary

5.10 Exercises

Join our community Discord space

Chapter 6 Modeling with Bambi

6.1 One syntax to rule them all

6.2 The bikes model, Bambi’s version

6.3 Polynomial regression

6.4 Splines

6.5 Distributional models

6.6 Categorical predictors

6.7 Interactions

6.8 Interpreting models with Bambi

6.9 Variable selection

6.10 Summary

6.11 Exercises

Join our community Discord space

Chapter 7 Mixture Models

7.1 Understanding mixture models

7.2 Finite mixture models

7.3 The non-identifiability of mixture models

7.4 How to choose K

7.5 Zero-Inflated and hurdle models

7.6 Mixture models and clustering

7.7 Non-finite mixture model

7.8 Continuous mixtures

7.9 Summary

7.10 Exercises

Join our community Discord space

Chapter 8 Gaussian Processes

8.1 Linear models and non-linear data

8.2 Modeling functions

8.3 Multivariate Gaussians and functions

8.4 Gaussian processes

8.5 Gaussian process regression

8.6 Gaussian process regression with PyMC

8.7 Gaussian process classification

8.8 Cox processes

8.9 Regression with spatial autocorrelation

8.10 Hilbert space GPs

8.11 Summary

8.12 Exercises

Join our community Discord space

Chapter 9 Bayesian Additive Regression Trees

9.1 Decision trees

9.2 BART models

9.3 Distributional BART models

9.4 Constant and linear response

9.5 Choosing the number of trees

9.6 Summary

9.7 Exercises

Join our community Discord space

Chapter 10 Inference Engines

10.1 Inference engines

10.2 The grid method

10.3 Quadratic method

10.4 Markovian methods

10.5 Sequential Monte Carlo

10.6 Diagnosing the samples

10.7 Convergence

10.8 Effective Sample Size (ESS)

10.9 Monte Carlo standard error

10.10 Divergences

10.11 Keep calm and keep trying

10.12 Summary

10.13 Exercises

Join our community Discord space

Chapter 11 Where to Go Next

Join our community Discord space

Bibliography

Other Books You May Enjoy

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

1.2 Working with data

Data is an essential ingredient in statistics and data science. Data comes from several sources, such as experiments, computer simulations, surveys, and field observations. If we are the ones in charge of generating or gathering the data, it is always a good idea to first think carefully about the questions we want to answer and which methods we will use, and only then proceed to get the data. There is a whole branch of statistics dealing with data collection, known as experimental design. In the era of the data deluge, we can sometimes forget that gathering data is not always cheap. For example, while it is true that the Large Hadron Collider (LHC) produces hundreds of terabytes a day, its construction took years of manual and intellectual labor.

As a general rule, we can think of the process of generating the data as stochastic, because there is ontological, technical, and/or epistemic uncertainty, that is, the system is intrinsically stochastic, there are technical issues adding noise or restricting us from measuring with arbitrary precision, and/or there are conceptual limitations veiling details from us. For all these reasons, we always need to interpret data in the context of models, including mental and formal ones. Data does not speak but through models.

In this book, we will assume that we already have collected the data. Our data will also be clean and tidy, something that’s rarely true in the real world. We will make these assumptions to focus on the subject of this book. I just want to emphasize, especially for newcomers to data analysis, that even when not covered in this book, there are important skills that you should learn and practice to successfully work with data.

A very useful skill when analyzing data is knowing how to write code in a programming language, such as Python. Manipulating data is usually necessary given that we live in a messy world with even messier data, and coding helps to get things done. Even if you are lucky and your data is very clean and tidy, coding will still be very useful since modern Bayesian statistics is done mostly through programming languages such as Python or R. If you want to learn how to use Python for cleaning and manipulating data, you can find a good introduction in Python for Data Analysis by McKinney [2022].

Bayesian Analysis with Python - Third Edition

By : Osvaldo Martin

Bayesian Analysis with Python - Third Edition

By: Osvaldo Martin

Overview of this book

Related Content you might be interested in

Current Title:

Bayesian Analysis with Python - Third Edition

Mastering Linux Administration

Linux for System Administrators

Mastering Linux Administration

1.2 Working with data