Book Image

Bayesian Analysis with Python

Book Image

Bayesian Analysis with Python

Overview of this book

The purpose of this book is to teach the main concepts of Bayesian data analysis. We will learn how to effectively use PyMC3, a Python library for probabilistic programming, to perform Bayesian parameter estimation, to check models and validate them. This book begins presenting the key concepts of the Bayesian framework and the main advantages of this approach from a practical point of view. Moving on, we will explore the power and flexibility of generalized linear models and how to adapt them to a wide array of problems, including regression and classification. We will also look into mixture models and clustering data, and we will finish with advanced topics like non-parametrics models and Gaussian processes. With the help of Python and PyMC3 you will learn to implement, check and expand Bayesian models to solve data analysis problems.
Table of Contents (15 chapters)
Bayesian Analysis with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Mixture models


Sometimes a process or phenomenon under study cannot be properly described using a single distribution like a Gaussian or a binomial, or any other canonical/pure distribution, but it can be described as a mixture of such distributions. Models that assume the data comes from a mixture of distributions are know as mixture models.

One kind of situation where mixture models arise naturally is when we have a dataset that is better described as a combination of real subpopulations. For example, it makes perfect sense to describe the distribution of heights, in an adult human population, as a mixture of female and male subpopulations. Even more, if we have to deal also with non-adults, we may find it useful to include a third group describing children, probably without needing to make a gender distinction inside this group. Another classical example of a mixture model approach is used to describe a group of handwritten digits. In this case, it also makes perfect sense to use 10 subpopulation...