In the MLE example, the data alone was used to estimate the parameter under observation. However, in many situations, we have a fairly good idea about the parameter being observed. If we were asked about the fairness of a coin, we are often fairly certain that the value of the parameter is 0.5, that is, heads and tails are equally likely. Bayesian statistics allows us to take this prior intuition into account and find a posterior that is informed by both the prior as well as the data. Even though we think the coin is fair, if we get 30,000 heads out of 100,000 flips, we will be convinced that the parameter is close to 0.3 and not 0.5, as surmised earlier.
We start the analysis by reversing our assumptions that each flip is independent and is a fixed quantity. We assume that is a random variable and each successive flip tells us more about the value of . We assume that the flips are conditionally independent given .
The joint distribution of the tosses and is...