Book Image

Bayesian Analysis with Python

Book Image

Bayesian Analysis with Python

Overview of this book

The purpose of this book is to teach the main concepts of Bayesian data analysis. We will learn how to effectively use PyMC3, a Python library for probabilistic programming, to perform Bayesian parameter estimation, to check models and validate them. This book begins presenting the key concepts of the Bayesian framework and the main advantages of this approach from a practical point of view. Moving on, we will explore the power and flexibility of generalized linear models and how to adapt them to a wide array of problems, including regression and classification. We will also look into mixture models and clustering data, and we will finish with advanced topics like non-parametrics models and Gaussian processes. With the help of Python and PyMC3 you will learn to implement, check and expand Bayesian models to solve data analysis problems.
Table of Contents (15 chapters)
Bayesian Analysis with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Posterior predictive checks


One of the nice elements of the Bayesian toolkit is that once we have a posterior, it is possible to use the posterior to generate future data y, that is, predictions. Posterior predictive checks consist of comparing the observed data and the predicted data to spot differences between these two sets. The main goal is to check for auto-consistency. The generated data and the observed data should look more or less similar, otherwise there was some problem during the modeling or some problem feeding the data to the model. But even if we did not make any mistake, differences could arise. Trying to understand the mismatch could lead us to improve models or at least to understand their limitations. Knowing which part of our problem/data the model is capturing well and which it is not is valuable information even if we do not know how to improve the model. Maybe the model captures well the mean behavior of our data but fails to predict rare values. This could be problematic for us, or maybe we only care about the mean, so this model will be okay for us. The general aim will be not to declare that a model is false; instead we follow George Box's advice, all models are wrong, but some are useful. We just want to know which part of the model we can trust and try to test whether the model is a good fit for our specific purpose. How confident one can be about a model is certainly not the same across disciplines. Physics can study systems under highly controlled conditions using high-level theories, so models are often seen as good descriptions of reality. Other disciplines such as sociology and biology study complex, difficult to isolate systems, and models have a weaker epistemological status. Nevertheless, independently of which discipline you are working in, models should always be checked and posterior predictive checks together with ideas from exploratory data analysis are a good way to check our models.