Book Image

scikit-learn Cookbook

By : Trent Hauck
Book Image

scikit-learn Cookbook

By: Trent Hauck

Overview of this book

<p>Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. Its consistent API and plethora of features help solve any machine learning problem it comes across.</p> <p>The book starts by walking through different methods to prepare your data—be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives—be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets.</p>
Table of Contents (12 chapters)
scikit-learn Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using k-NN for regression


Regression is covered elsewhere in the book, but we might also want to run a regression on "pockets" of the feature space. We can think that our dataset is subject to several data processes. If this is true, only training on similar data points is a good idea.

Getting ready

Our old friend, regression, can be used in the context of clustering. Regression is obviously a supervised technique, so we'll use k-Nearest Neighbors (k-NN) clustering rather than KMeans.

For the k-NN regression, we'll use the K closest points in the feature space to build the regression rather than using the entire space as in regular regression.

How to do it…

For this recipe, we'll use the iris dataset. If we want to predict something such as the petal width for each flower, clustering by iris species can potentially give us better results. The k-NN regression won't cluster by the species, but we'll work under the assumption that the Xs will be close for the same species, or in this case, the petal...