-
Book Overview & Buying
-
Table Of Contents
scikit-learn Cookbook - Third Edition
By :
In this chapter, we’ve covered several methods commonly applied to data preprocessing. Now it’s time to put it all together! Can you guess what tool might be helpful for this exercise? You got it: the Pipeline() class!
For these exercises, we will use a publicly available dataset, California Housing, which is included in the scikit-learn library. The dataset contains 20,640 records and 9 features, where the target value (what we are trying to predict with our model) is the average home price per 100,000 homes.
You are tasked with building a comprehensive data pipeline composed of steps you learned in this chapter. In the Jupyter notebook for Chapter 2, you will find an incomplete code block at the end called Comprehensive Pipeline, where you should add your code to complete the following steps: