As demonstrated in the previous chapters, turning estimated models into scores is not very challenging, and could be done under non-Spark platforms. However, Apache Spark makes things easy and fast as demonstrated.
With the notebook approach adopted in this chapter, we will fully achieve the advantage to quickly produce new scores when data and customer requirements get changed.
Users will find some similarity to the deployment work in the last chapter—the deployment of scoring for fraud detection.
From coefficients of our predictive models, we can derive a risk score for possible default, which takes some work. But it gives the client the flexibility of changing it whenever needed.
With logistic regression, the process of producing scores is relatively easy—it uses the following formulae for logistic regression:
Specifically, Prob(Yi=1) = exp(BXi)/(1+exp(BXi))
produces the default probability, with Y=1
as the default, and X is a sum of all the features. In R, exp(coef(logit_model...