-
Book Overview & Buying
-
Table Of Contents
Hands-On Gradient Boosting with XGBoost and scikit-learn
By :
To get a better sense of how random forests work, let's build one using scikit-learn.
Let's use a random forest classifier to predict whether a user makes more or less than USD 50,000 using the census dataset we cleaned and scored in Chapter 1, Machine Learning Landscape, and revisited in Chapter 2, Decision Trees in Depth. We are going to use cross_val_score to ensure that our test results generalize well:
The following steps build and score a random forest classifier using the census dataset:
Import pandas, numpy, RandomForestClassifier, and cross_val_score before silencing warnings:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import warnings
warnings.filterwarnings('ignore')Load the dataset census_cleaned.csv and split it into X (a predictor column) and y (a target column):
df_census = pd.read_csv...