Bayesian Analysis with Python

Bayesian Analysis with Python

Overview of this book

The purpose of this book is to teach the main concepts of Bayesian data analysis. We will learn how to effectively use PyMC3, a Python library for probabilistic programming, to perform Bayesian parameter estimation, to check models and validate them. This book begins presenting the key concepts of the Bayesian framework and the main advantages of this approach from a practical point of view. Moving on, we will explore the power and flexibility of generalized linear models and how to adapt them to a wide array of problems, including regression and classification. We will also look into mixture models and clustering data, and we will finish with advanced topics like non-parametrics models and Gaussian processes. With the help of Python and PyMC3 you will learn to implement, check and expand Bayesian models to solve data analysis problems.

Bayesian Analysis with Python

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Thinking Probabilistically - A Bayesian Inference Primer

Statistics as a form of modeling

Probabilities and uncertainty

Single parameter inference

Posterior predictive checks

Installing the necessary Python packages

Summary

Exercises

Programming Probabilistically – A PyMC3 Primer

Probabilistic programming

PyMC3 introduction

Summarizing the posterior

Summary

Keep reading

Exercises

Juggling with Multi-Parametric and Hierarchical Models

Nuisance parameters and marginalized distributions

Gaussians, Gaussians, Gaussians everywhere

Summary

Understanding and Predicting Data with Linear Regression Models

Simple linear regression

Robust linear regression

Hierarchical linear regression

Polynomial regression

Multiple linear regression

The GLM module

Summary

Keep reading

Exercises

Classifying Outcomes with Logistic Regression

Logistic regression

Multiple logistic regression

Discriminative and generative models

Summary

Keep reading

Exercises

Model Comparison

Occam's razor – simplicity and accuracy

Regularizing priors

Predictive accuracy measures

Bayes factors

Bayes factors and information criteria

Summary

Keep reading

Exercises

Mixture Models

Mixture models

Model-based clustering

Continuous mixtures

Summary

Keep reading

Exercises

Gaussian Processes

Non-parametric statistics

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Preface

Bayesian statistics has been around for more than 250 years now. During this time it has enjoyed as much recognition and appreciation as disdain and contempt. Through the last few decades it has gained more and more attention from people in statistics and almost all other sciences, engineering, and even outside the walls of the academic world. This revival has been possible due to theoretical and computational developments. Modern Bayesian statistics is mostly computational statistics. The necessity for flexible and transparent models and a more interpretation of statistical analysis has only contributed to the trend.

Here, we will adopt a pragmatic approach to Bayesian statistics and we will not care too much about other statistical paradigms and their relationship to Bayesian statistics. The aim of this book is to learn about Bayesian data analysis with the help of Python. Philosophical discussions are interesting but they have already been undertaken elsewhere in a richer way than we can discuss in these pages.

We will take a modeling approach to statistics, we will learn to think in terms of probabilistic models, and apply Bayes' theorem to derive the logical consequences of our models and data. The approach will also be computational; models will be coded using PyMC3—a great library for Bayesian statistics that hides most of the mathematical details and computations from the user. Bayesian methods are theoretically grounded in probability theory and hence it's no wonder that many books about Bayesian statistics are full of mathematical formulas requiring a certain level of mathematical sophistication. Learning the mathematical foundations of statistics could certainly help you build better models and gain intuition about problems, models, and results. Nevertheless, libraries, such as PyMC3 allow us to learn and do Bayesian statistics with only a modest mathematical knowledge, as you will be able to verify by yourself throughout this book.

What this book covers

Chapter 1, Thinking Probabilistically – A Bayesian Inference Primer, tells us about Bayes' theorem and its implications for data analysis. We then proceed to describe the Bayesian-way of thinking and how and why probabilities are used to deal with uncertainty. This chapter contains the foundational concepts used in the rest of the book.

Chapter 2, Programming Probabilistically – A PyMC3 Primer, revisits the concepts from the previous chapter, this time from a more computational perspective. The PyMC3 library is introduced and we learn how to use it to build probabilistic models, get results by sampling from the posterior, diagnose whether the sampling was done right, and analyze and interpret Bayesian results.

Chapter 3, Juggling with Multi-Parametric and Hierarchical Models, tells us about the very basis of Bayesian modeling and we start adding complexity to the mix. We learn how to build and analyze models with more than one parameter and how to put structure into models, taking advantages of hierarchical models.

Chapter 4, Understanding and Predicting Data with Linear Regression Models, tells us about how linear regression is a very widely used model per se and a building block of more complex models. In this chapter, we apply linear models to solve regression problems and how to adapt them to deal with outliers and multiple variables.

Chapter 5, Classifying Outcomes with Logistic Regression, generalizes the the linear model from previous chapter to solve classification problems including problems with multiple input and output variables.

Chapter 6, Model Comparison, discusses the difficulties associated with comparing models that are common in statistics and machine learning. We will also learn a bit of theory behind the information criteria and Bayes factors and how to use them to compare models, including some caveats of these methods.

Chapter 7, Mixture Models, discusses how to mix simpler models to build more complex ones. This leads us to new models and also to reinterpret models learned in previous chapters. Problems, such as data clustering and dealing with count data, are discussed.

Chapter 8, Gaussian Processes, closes the book by briefly discussing some more advanced concepts related to non-parametric statistics. What kernels are, how to use kernelized linear regression, and how to use Gaussian processes for regression are the central themes of this chapter.

What you need for this book

This book is written for Python version >= 3.5, and it is recommended that you use the most recent version of Python 3 that is currently available, although most of the code examples may also run for older versions of Python, including Python 2.7 with minor adjustments.

Maybe the easiest way to install Python and Python libraries is using Anaconda, a scientific computing distribution. You can read more about Anaconda and download it from https://www.continuum.io/downloads. Once Anaconda is in our system, we can install new Python packages with this command: conda install NamePackage.

We will use the following python packages:

Ipython 5.0
NumPy 1.11.1
SciPy 0.18.1
Pandas 0.18.1
Matplotlib 1.5.3
Seaborn 0.7.1
PyMC3 3.0

Who this book is for

Undergraduate or graduate students, scientists, and data scientists who are not familiar with the Bayesian statistical paradigm and wish to learn how to do Bayesian data analysis. No previous knowledge of statistics is assumed, for either Bayesian or other paradigms. The required mathematical knowledge is kept to a minimum and all concepts are described and explained with code, figures, and text. Mathematical formulas are used only when we think it can help the reader to better understand the concepts. The book assumes you know how to program in Python. Familiarity with scientific libraries such as NumPy, matplotlib, or Pandas is helpful but not essential.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "To compute the HPD in the correct way we will use the function plot_post."

A block of code is set as follows:

n_params = [1, 2, 4]
p_params = [0.25, 0.5, 0.75]
x = np.arange(0, max(n_params)+1)
f, ax = plt.subplots(len(n_params), len(p_params), sharex=True, 
  sharey=True)
for i in range(3):
    for j in range(3):
        n = n_params[i]
        p = p_params[j]
        y = stats.binom(n=n, p=p).pmf(x)
        ax[i,j].vlines(x, 0, y, colors='b', lw=5)
        ax[i,j].set_ylim(0, 1)
        ax[i,j].plot(0, 0, label="n = {:3.2f}\np = {:3.2f}".format(n, p), alpha=0)
        ax[i,j].legend(fontsize=12)
ax[2,1].set_xlabel('$\\theta$', fontsize=14)
ax[1,0].set_ylabel('$p(y|\\theta)$', fontsize=14)
ax[0,0].set_xticks(x)

Any command-line input or output is written as follows:

conda install NamePackage

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Bayesian-Analysis-with-Python. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/BayesianAnalysiswithPython_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Bayesian Analysis with Python

Bayesian Analysis with Python

Overview of this book

Related Content you might be interested in

Current Title:

Bayesian Analysis with Python

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions