Book Image

Bayesian Analysis with Python

Book Image

Bayesian Analysis with Python

Overview of this book

The purpose of this book is to teach the main concepts of Bayesian data analysis. We will learn how to effectively use PyMC3, a Python library for probabilistic programming, to perform Bayesian parameter estimation, to check models and validate them. This book begins presenting the key concepts of the Bayesian framework and the main advantages of this approach from a practical point of view. Moving on, we will explore the power and flexibility of generalized linear models and how to adapt them to a wide array of problems, including regression and classification. We will also look into mixture models and clustering data, and we will finish with advanced topics like non-parametrics models and Gaussian processes. With the help of Python and PyMC3 you will learn to implement, check and expand Bayesian models to solve data analysis problems.
Table of Contents (15 chapters)
Bayesian Analysis with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Robust linear regression


Assuming our data follows a Gaussian distribution is perfectly reasonable in many situations. By assuming Gaussianity, we are not necessarily saying that our data is really Gaussian; instead we are saying that it is a reasonable approximation for our current problem. As we saw in the previous chapter, sometimes this Gaussian assumption fails, for example in the presence of outliers. We learned that using a Student's t-distribution is a way to effectively deal with outliers and get a more robust inference. The very same idea can be applied to linear regression.

To exemplify the robustness that a Student's t-distribution brings to a linear regression we are going to use a very simple and nice dataset: the third data group from the Anscombe quartet. If you do not know what the Anscombe quartet is, remember to check it later at Wikipedia. We can upload it from seaborn:

ans = sns.load_dataset('anscombe')
x_3 = ans[ans.dataset == 'III']['x'].values
y_3 = ans[ans.dataset...