Handling missing data with univariate imputation using pandas
Generally, there are two approaches to imputing missing data: univariate imputation and multivariate imputation. This recipe will explore univariate imputation techniques available in pandas.
In univariate imputation, you use non-missing values in a single variable (think a column or feature) to impute the missing values for that variable. For example, if you have a sales column in the dataset with some missing values, you can use a univariate imputation method to impute missing sales observations using average sales. Here, a single column (sales
) was used to calculate the mean (from non-missing values) for imputation.
Some basic univariate imputation techniques include the following: