•  #### Learning Bayesian Models with R #### Overview of this book

Learning Bayesian Models with R Credits   www.PacktPub.com Preface  Free Chapter
Introducing the Probability Theory The R Environment Introducing Bayesian Inference Machine Learning Using Bayesian Inference Bayesian Regression Models Bayesian Classification Models Bayesian Models for Unsupervised Learning Bayesian Neural Networks Bayesian Modeling at Big Data Scale Index ## Expectations and covariance

Having known the distribution of a set of random variables , what one would be typically interested in for real-life applications is to be able to estimate the average values of these random variables and the correlations between them. These are computed formally using the following expressions:  For example, in the case of two-dimensional normal distribution, if we are interested in finding the correlation between the variables and , it can be formally computed from the joint distribution using the following formula: ### Binomial distribution

A binomial distribution is a discrete distribution that gives the probability of heads in n independent trials where each trial has one of two possible outcomes, heads or tails, with the probability of heads being p. Each of the trials is called a Bernoulli trial. The functional form of the binomial distribution is given by: Here, denotes the probability of having k heads in n trials. The mean of the binomial distribution is given by np and variance is given by np(1-p). Have a look at the following graphs: The preceding graphs show the binomial distribution for two values of n; 100 and 1000 for p = 0.7. As you can see, when n becomes large, the Binomial distribution becomes sharply peaked. It can be shown that, in the large n limit, a binomial distribution can be approximated using a normal distribution with mean np and variance np(1-p). This is a characteristic shared by many discrete distributions that, in the large n limit, they can be approximated by some continuous distributions.

### Beta distribution

The Beta distribution denoted by is a function of the power of , and its reflection is given by: Here, are parameters that determine the shape of the distribution function and is the Beta function given by the ratio of Gamma functions: .

The Beta distribution is a very important distribution in Bayesian inference. It is the conjugate prior probability distribution (which will be defined more precisely in the next chapter) for binomial, Bernoulli, negative binomial, and geometric distributions. It is used for modeling the random behavior of percentages and proportions. For example, the Beta distribution has been used for modeling allele frequencies in population genetics, time allocation in project management, the proportion of minerals in rocks, and heterogeneity in the probability of HIV transmission.

### Gamma distribution

The Gamma distribution denoted by is another common distribution used in Bayesian inference. It is used for modeling the waiting times such as survival rates. Special cases of the Gamma distribution are the well-known Exponential and Chi-Square distributions.

In Bayesian inference, the Gamma distribution is used as a conjugate prior for the inverse of variance of a one-dimensional normal distribution or parameters such as the rate ( ) of an exponential or Poisson distribution.

The mathematical form of a Gamma distribution is given by: Here, and are the shape and rate parameters, respectively (both take values greater than zero). There is also a form in terms of the scale parameter , which is common in econometrics. Another related distribution is the Inverse-Gamma distribution that is the distribution of the reciprocal of a variable that is distributed according to the Gamma distribution. It's mainly used in Bayesian inference as the conjugate prior distribution for the variance of a one-dimensional normal distribution.

### Dirichlet distribution

The Dirichlet distribution is a multivariate analogue of the Beta distribution. It is commonly used in Bayesian inference as the conjugate prior distribution for multinomial distribution and categorical distribution. The main reason for this is that it is easy to implement inference techniques, such as Gibbs sampling, on the Dirichlet-multinomial distribution.

The Dirichlet distribution of order is defined over an open dimensional simplex as follows: Here, , , and .

### Wishart distribution

The Wishart distribution is a multivariate generalization of the Gamma distribution. It is defined over symmetric non-negative matrix-valued random variables. In Bayesian inference, it is used as the conjugate prior to estimate the distribution of inverse of the covariance matrix (or precision matrix) of the normal distribution. When we discussed Gamma distribution, we said it is used as a conjugate distribution for the inverse of the variance of the one-dimensional normal distribution.

The mathematical definition of the Wishart distribution is as follows: Here, denotes the determinant of the matrix of dimension and is the degrees of freedom.

A special case of the Wishart distribution is when corresponds to the well-known Chi-Square distribution function with degrees of freedom.

Wikipedia gives a list of more than 100 useful distributions that are commonly used by statisticians (reference 1 in the Reference section of this chapter). Interested readers should refer to this article.