Bootstrapping is a procedure similar to jackknifing. The basic bootstrapping method has the following steps:
Generate samples from the original data of size N. Visualize the original data sample as a bowl of numbers. We create new samples by taking numbers at random from the bowl. After taking a number, we return it to the bowl.
For each generated sample, we compute the statistical estimator of interest (for example, the arithmetic mean).
We will apply numpy.random.choice()
to do bootstrapping:
Generate a data sample following the binomial distribution that simulates flipping a fair coin five times:
N = 400 np.random.seed(28) data = np.random.binomial(5, .5, size=N)
Generate 30 samples and compute their means (more samples will give a better result):
bootstrapped = np.random.choice(data, size=(N, 30)) means = bootstrapped.mean(axis=0)
Visualize the arithmetic means distribution with a
matplotlib
box plot:plt.title('Bootstrapping demo') plt...