In the last section, we completed our model estimation task. Now, it is time for us to evaluate the estimated models to see whether they meet our model quality criteria so that we can either move to our next stage for the results explanation or go back to some previous stages to refine our models.
To perform our model evaluation, in this section, we will focus our effort on utilizing RMSE (Root-Mean-Square Error) and ROC (Receiver Operating Characteristic) curves to assess the quality of fit for our models. To calculate RMSEs and ROC curves, we need to use our test data rather than training data used to estimate our models.
Many packages have already included some algorithms for users to assess models quickly. For example, both MLlib and R have algorithms to return confusion matrix for logistic regression models and even get false positive numbers calculated.
Specifically, MLlib has the confusionMatrix
and numFalseNegatives()
functions for us to use and even...