In the previous recipes, we have already seen some values predicted by our classification and regression models and how far or how close they were from/to the original values. In this recipe, we will learn how to fully calculate the performance statistics for these models.
In order to execute this recipe, you need to have a working Spark environment and you should have gone through the Predicting hours of work for census respondents and Forecasting income levels of census respondents recipes presented earlier in this chapter.
No other prerequisites are required.
Getting the performance metrics for regression and classification in Spark is extremely simple:
import pyspark.mllib.evaluation as ev (...) metrics_lm = ev.RegressionMetrics(true_pred_reg) (...) metrics_lr = ev.BinaryClassificationMetrics(true_pred_class_lr)