One major issue facing the credit risk industry from regulators is due to the black box nature of machine learning models. This section focuses upon drawing parallels between logistic regression and random forest models to create transparency for random forest, so that it will be less intimidating for regulators while approving implementation of machine learning models. Last but not least, readers will also be educated on the comparison of statistical models with machine learning models.
In the following table, both models explanatory variables have been put in descending order based on the importance of them towards the model contribution. In the logistic regression model, it is the p-value (minimum is a better predictor), and for random forest it is the mean decrease in Gini (maximum is a better predictor). Many of the variables are very much matching in importance like, status_exs_accnt_A14
, credit_hist_A34
, Installment_rate_in_percentage_of_disposable_income...