In this recipe, we'll quickly look at stratified k-fold valuation. We've walked through different recipes where the class representation was unbalanced in some manner. Stratified k-fold is nice because its scheme is specifically designed to maintain the class proportions.
We're going to create a small dataset. In this dataset, we will then use stratified k-fold validation. We want it small so that we can see the variation. For larger samples. it probably won't be as big of a deal.
We'll then plot the class proportions at each step to illustrate how the class proportions are maintained:
>>> from sklearn import datasets >>> X, y = datasets.make_classification(n_samples=int(1e3), weights=[1./11])
Let's check the overall class weight distribution:
>>> y.mean() 0.90300000000000002
Roughly, 90.5 percent of the samples are 1, with the balance 0.