Book Image

Training Systems using Python Statistical Modeling

By : Curtis Miller
Book Image

Training Systems using Python Statistical Modeling

By: Curtis Miller

Overview of this book

Python's ease-of-use and multi-purpose nature has made it one of the most popular tools for data scientists and machine learning developers. Its rich libraries are widely used for data analysis, and more importantly, for building state-of-the-art predictive models. This book is designed to guide you through using these libraries to implement effective statistical models for predictive analytics. You’ll start by delving into classical statistical analysis, where you will learn to compute descriptive statistics using pandas. You will focus on supervised learning, which will help you explore the principles of machine learning and train different machine learning models from scratch. Next, you will work with binary prediction models, such as data classification using k-nearest neighbors, decision trees, and random forests. The book will also cover algorithms for regression analysis, such as ridge and lasso regression, and their implementation in Python. In later chapters, you will learn how neural networks can be trained and deployed for more accurate predictions, and understand which Python libraries can be used to implement them. By the end of this book, you will have the knowledge you need to design, build, and deploy enterprise-grade statistical models for machine learning using Python and its rich ecosystem of libraries for predictive analytics.
Table of Contents (9 chapters)

Bayesian analysis for proportions

In this section, we'll revisit inference for proportions, but from a Bayesian perspective. We will look at Bayesian methods for analyzing proportions of success in a group. This includes talking about computing credible intervals, and the Bayesian version of hypothesis testing for both one and two samples.

Conjugate priors are a class of prior probability distributions common in Bayesian statistics. A conjugate prior is a prior distribution such that the posterior distribution belongs to the same family of probability distributions as the prior. For binary data, the beta distribution is a conjugate prior. This is a distribution defined where only values in the (0, 1) interval have a chance of appearing. They are specified by two parameters. In a trial, if there are M successes out of N trials, then the posterior distribution is the prior distribution when we add M to the first parameter of the prior, and N - M to the second parameter of the prior. This concentrates the distribution to the observed population proportion.

Conjugate priors for proportions

So, let's see this in action. For data that takes values of either 0 or 1, we're going to use the beta distribution as our conjugate prior. The notation that is used to refer to the beta distribution is B(α, β).

α - 1 can be interpreted as imaginary prior successes, and β - 1 can be interpreted as imaginary prior failures. That's if you have added the data to your datasetimaginary successes and imaginary failures.

If α = β = 1, then we interpret this as being no prior successes or failures; therefore, every probability of success, θ, is equally likely in some sense. This is referred to as an uninformative prior. Let's now implement this using the following steps:

  1. First, we're going to import the beta function from scipy.stats; this is the beta distribution. In addition to this, we will import the numpy library and the matplotlib library, as follows:
  1. We're then going to plot the function and see how it looks, using the following code:

This results in the following output:

So, if we plot β when α=1 and β=1, we end up with a uniform distribution. In some sense, each p is equally likely.

  1. Now, we will use a=3 and b=3, to indicate two imaginary successes and two imaginary failures, which gives us the following output:

Now, our prior distribution biases our data toward 0.5in other words, it is equally likely to succeed as it is to fail.

Given a sample size of N, if there are M successes, then the posterior distribution when the prior is β, with the parameters (α, β), will be B (α + M, β + N - M). So, let's reconsider an earlier example; we have a website with 1,126 visitors. 310 clicked on an ad purchased by a sponsor, and we want to know what proportion of individuals will click on the ad in general.

  1. So, we're going to use our prior distribution beta (3, 3). This means that the posterior distribution will be given by the beta distribution, with the first parameter, 313, and the second parameter, 819. This is what the prior distribution and posterior distribution looks like when plotted against each other:

The blue represents the prior distribution, and red represents the posterior distribution.

Credible intervals for proportions

Bayesian statistics doesn't use confidence intervals but credible intervals instead. We specify a probability, and that will be the probability that the parameter of interest lies in the credible interval. For example, there's a 95% chance that θ lies in its 95% credible interval. We compute credible intervals by computing the quantiles from the posterior distribution of the parameter, so that the chosen proportion of the posterior distribution lies between these two quantiles.

So, I've already gone ahead and written a function that will compute credible intervals for you. You give this function the number of successes, the total sample size, the first argument of the prior and the second argument of the prior, and the credibility (or chance of containing θ) of the interval. You can see the entire function as follows:

So, here is the function; I've already written it so that it works for you. We can use this function to compute credible intervals for our data.

So, we have a 95% credible interval based on the uninformative prior, as follows:

Therefore, we believe that θ will be between 25% and 30%, with a 95% probability.

The next one is the same interval when we have a different priorthat is, the one that we actually used before and is the one that we plotted:

The data hasn't changed very much, but still, this is going to be our credible interval.

The last one is the credible interval when we increase the level of credibility to .99 or the probability of containing the true parameter:

Since this probability is higher, this must be a longer interval, which is exactly what we see, although it's not that much longer.

Bayesian hypothesis testing for proportions

Unlike classical statistics, where we say a hypothesis is either right or wrong, Bayesian statistics holds that every hypothesis is true, with some probability. We don't reject hypotheses, but simply ignore them if they are unlikely to be true. For one sample, computing the probability of a hypothesis can be done by considering what region of possible values of θ correspond to the hypothesis being true, and using the posterior distribution of θ to compute the probability that θ is in that region.

In this case, we need to use what's known as the cumulative distribution function (CDF) of the posterior distribution. This is the probability that a random variable is less than or equal to a quantity, x. So, what we want is the probability that θ is greater than 0.3 when D is given, that is, if we are testing the website administrator's claim that there are at least 30% of visitors to the site clicking on the ad.

So, we will use the CDF function and evaluate it at 0.3. This is going to correspond to the administrator's claim. This will give us the probability that more than 30% of visitors clicked on the ad. The following screenshot shows how we define the CDF function:

What we end up with is a very small probability, therefore, it's likely that the administrator is incorrect.

Now, while there's a small probability, I would like to point out that this is not the same thing as a p value. A p value says something completely different; a p value should not be interpreted as the probability that the null hypothesis is true, whereas, in this case, this can be interpreted as a probability that the hypothesis we asked is true. This is the probability that data is greater than 0.3, given the data that we saw.

Comparing two proportions

Sometimes, we may want to compare two proportions from two populations. Crucially, we will assume that they are independent of each other. It's difficult to analytically compute the probability that one proportion is less than another, so we often rely on Monte Carlo methods, otherwise known as simulation or random sampling.

We randomly generate the two proportions from their respective posterior distributions, and then track how often one is less than the other. We use the frequency we observed in our simulation to estimate the desired probability.

So, let's see this in action; we have two parameters: θA and θB. These correspond to the proportion of individuals who click on an ad from format A or format B. Users are randomly assigned to one format or the other, and the website tracks how many viewers click on the ad in the different formats.

516 visitors saw format A and 108 of them clicked it. 510 visitors saw format B and 144 of them clicked it. We use the same prior for both θA and θB, which is beta (3, 3). Additionally, the posterior distribution for θA will be B (111, 411) and for θB, it will be B (147, 369). This results in the following output:

We now want to know the probability of θA being less than θBthis is difficult to compute analytically. We can randomly simulate θA and θB, and then use that to estimate this probability. So, let's randomly simulate one θA, as follows:

Then, randomly simulate one θB, as follows:

Finally, we're going to do 1,000 simulations by computing 1,000 θA values and 1,000 θB values, as follows:

This is what we end up with; here, we can see how often θA is less than θB, that is, θA was 996 times less than θB. So, what's the average of this? Well, it is 0.996; this is the probability that θA is less than θB, or an estimate of that probability. Given this, it seems highly likely that more people clicked on the ad for format B than people who clicked on the ad for format A.

That's it for proportions. Next up, we will look at Bayesian methods for analyzing the means of quantitative data.