Chapter 6
Imputation of Missing Data, Financial Analysis, and Delivery to Client
Section 3
Dealing with Missing Data: Imputation Strategies
Recall that in Lesson 1, Data Exploration and Cleaning, we encountered a sizable proportion of samples in the dataset (3,021/29,685 = 10.2%) where the value of the PAY_1 feature was missing. This is a problem that needs to be dealt with, because many machine learning algorithms, including the implementations of logistic regression and random forest in scikit-learn, cannot accept input for model training or testing that includes missing values.