Book Image

Learning Bayesian Models with R

By : Hari Manassery Koduvely
Book Image

Learning Bayesian Models with R

By: Hari Manassery Koduvely

Overview of this book

Bayesian Inference provides a unified framework to deal with all sorts of uncertainties when learning patterns form data using machine learning models and use it for predicting future observations. However, learning and implementing Bayesian models is not easy for data science practitioners due to the level of mathematical treatment involved. Also, applying Bayesian methods to real-world problems requires high computational resources. With the recent advances in computation and several open sources packages available in R, Bayesian modeling has become more feasible to use for practical applications today. Therefore, it would be advantageous for all data scientists and engineers to understand Bayesian methods and apply them in their projects to achieve better results. Learning Bayesian Models with R starts by giving you a comprehensive coverage of the Bayesian Machine Learning models and the R packages that implement them. It begins with an introduction to the fundamentals of probability theory and R programming for those who are new to the subject. Then the book covers some of the important machine learning methods, both supervised and unsupervised learning, implemented using Bayesian Inference and R. Every chapter begins with a theoretical description of the method explained in a very simple manner. Then, relevant R packages are discussed and some illustrations using data sets from the UCI Machine Learning repository are given. Each chapter ends with some simple exercises for you to get hands-on experience of the concepts and R packages discussed in the chapter. The last chapters are devoted to the latest development in the field, specifically Deep Learning, which uses a class of Neural Network models that are currently at the frontier of Artificial Intelligence. The book concludes with the application of Bayesian methods on Big Data using the Hadoop and Spark frameworks.
Table of Contents (11 chapters)
10
Index

Expectations and covariance

Having known the distribution of a set of random variables Expectations and covariance, what one would be typically interested in for real-life applications is to be able to estimate the average values of these random variables and the correlations between them. These are computed formally using the following expressions:

Expectations and covariance
Expectations and covariance

For example, in the case of two-dimensional normal distribution, if we are interested in finding the correlation between the variables Expectations and covariance and Expectations and covariance, it can be formally computed from the joint distribution using the following formula:

Expectations and covariance

Binomial distribution

A binomial distribution is a discrete distribution that gives the probability of heads in n independent trials where each trial has one of two possible outcomes, heads or tails, with the probability of heads being p. Each of the trials is called a Bernoulli trial. The functional form of the binomial distribution is given by:

Binomial distribution

Here, Binomial distribution denotes the probability of having k heads in n trials. The mean of the binomial distribution is given by np and variance is given by np(1-p). Have a look at the following graphs:

Binomial distribution

The preceding graphs show the binomial distribution for two values of n; 100 and 1000 for p = 0.7. As you can see, when n becomes large, the Binomial distribution becomes sharply peaked. It can be shown that, in the large n limit, a binomial distribution can be approximated using a normal distribution with mean np and variance np(1-p). This is a characteristic shared by many discrete distributions that, in the large n limit, they can be approximated by some continuous distributions.

Beta distribution

The Beta distribution denoted by Beta distribution is a function of the power of Beta distribution, and its reflection Beta distribution is given by:

Beta distribution

Here, Beta distribution are parameters that determine the shape of the distribution function and Beta distribution is the Beta function given by the ratio of Gamma functions: Beta distribution.

The Beta distribution is a very important distribution in Bayesian inference. It is the conjugate prior probability distribution (which will be defined more precisely in the next chapter) for binomial, Bernoulli, negative binomial, and geometric distributions. It is used for modeling the random behavior of percentages and proportions. For example, the Beta distribution has been used for modeling allele frequencies in population genetics, time allocation in project management, the proportion of minerals in rocks, and heterogeneity in the probability of HIV transmission.

Gamma distribution

The Gamma distribution denoted by Gamma distribution is another common distribution used in Bayesian inference. It is used for modeling the waiting times such as survival rates. Special cases of the Gamma distribution are the well-known Exponential and Chi-Square distributions.

In Bayesian inference, the Gamma distribution is used as a conjugate prior for the inverse of variance of a one-dimensional normal distribution or parameters such as the rate (Gamma distribution) of an exponential or Poisson distribution.

The mathematical form of a Gamma distribution is given by:

Gamma distribution

Here, Gamma distribution and Gamma distribution are the shape and rate parameters, respectively (both take values greater than zero). There is also a form in terms of the scale parameter Gamma distribution, which is common in econometrics. Another related distribution is the Inverse-Gamma distribution that is the distribution of the reciprocal of a variable that is distributed according to the Gamma distribution. It's mainly used in Bayesian inference as the conjugate prior distribution for the variance of a one-dimensional normal distribution.

Dirichlet distribution

The Dirichlet distribution is a multivariate analogue of the Beta distribution. It is commonly used in Bayesian inference as the conjugate prior distribution for multinomial distribution and categorical distribution. The main reason for this is that it is easy to implement inference techniques, such as Gibbs sampling, on the Dirichlet-multinomial distribution.

The Dirichlet distribution of order Dirichlet distribution is defined over an open Dirichlet distribution dimensional simplex as follows:

Dirichlet distribution

Here, Dirichlet distribution, Dirichlet distribution, and Dirichlet distribution.

Wishart distribution

The Wishart distribution is a multivariate generalization of the Gamma distribution. It is defined over symmetric non-negative matrix-valued random variables. In Bayesian inference, it is used as the conjugate prior to estimate the distribution of inverse of the covariance matrix Wishart distribution (or precision matrix) of the normal distribution. When we discussed Gamma distribution, we said it is used as a conjugate distribution for the inverse of the variance of the one-dimensional normal distribution.

The mathematical definition of the Wishart distribution is as follows:

Wishart distribution

Here, Wishart distribution denotes the determinant of the matrix Wishart distribution of dimension Wishart distribution and Wishart distribution is the degrees of freedom.

A special case of the Wishart distribution is when Wishart distribution corresponds to the well-known Chi-Square distribution function with Wishart distribution degrees of freedom.

Wikipedia gives a list of more than 100 useful distributions that are commonly used by statisticians (reference 1 in the Reference section of this chapter). Interested readers should refer to this article.