Chapter 8. Monte Carlo Inference
One of the key challenges in supervised learning is the generation or extraction of an appropriate training set. Despite the effort and best intentions of the data scientist, the labeled data is not directly usable.
Let's take, for example, the problem of predicting the click through rate for an online display. 95-99% of data is labeled with a no-click event (negative classification class) while 1-5% of events are labeled as clicked (positive class). The unbalanced training set may produce an erroneous model unless the negatively-labeled events are reduced through sampling.
This chapter deals with the need, role, and some common methods of sampling a dataset. It covers the following topics:
Generation of random samples from a given distribution
Application of Monte Carlo numerical sampling to approximation
Bootstrapping
Markov Chain Monte Carlo for estimating parametric distribution
Although random generators are of critical importance in statistics and machine...