4. The Bias-Variance Trade-Off
Activity 4.01: Cross-Validation and Feature Engineering with the Case Study Data
Solution:
- Select out the features from the DataFrame of the case study data.
You can use the list of feature names that we've already created in this chapter, but be sure not to include the response variable, which would be a very good (but entirely inappropriate) feature:
features = features_response[:-1] X = df[features].values
- Make a training/test split using a random seed of 24:
X_train, X_test, y_train, y_test = \ train_test_split(X, df['default payment next month'].values, test_size=0.2, random_state=24)
We'll use this going forward and reserve this test data as the unseen test set. By specifying the random seed, we can easily create separate notebooks with other modeling approaches using the same training data.
- Instantiate
MinMaxScaler...