Book Image

scikit-learn Cookbook

By : Trent Hauck
Book Image

scikit-learn Cookbook

By: Trent Hauck

Overview of this book

<p>Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. Its consistent API and plethora of features help solve any machine learning problem it comes across.</p> <p>The book starts by walking through different methods to prepare your data—be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives—be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets.</p>
Table of Contents (12 chapters)
scikit-learn Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Stratified k-fold


In this recipe, we'll quickly look at stratified k-fold valuation. We've walked through different recipes where the class representation was unbalanced in some manner. Stratified k-fold is nice because its scheme is specifically designed to maintain the class proportions.

Getting ready

We're going to create a small dataset. In this dataset, we will then use stratified k-fold validation. We want it small so that we can see the variation. For larger samples. it probably won't be as big of a deal.

We'll then plot the class proportions at each step to illustrate how the class proportions are maintained:

>>> from sklearn import datasets
>>> X, y = datasets.make_classification(n_samples=int(1e3), weights=[1./11])

Let's check the overall class weight distribution:

>>> y.mean()

0.90300000000000002

Roughly, 90.5 percent of the samples are 1, with the balance 0.

How to do it...

Let's create a stratified k-fold object and iterate it through each fold. We'll...