At this point, we are ready to evaluate our model's performance
This section will require importing the following libraries:
metrics
fromsklearn
BinaryClassificationEvaluator
frompyspark.ml.evaluation
This section walks through the steps to evaluate the TF-IDF NLP model.
- Create a confusion matrix using the following script:
predictionDF.crosstab('label', 'prediction').show()
- Evaluate the model using
metrics
from sklearn with the following script:
from sklearn import metrics actual = predictionDF.select('label').toPandas() predicted = predictionDF.select('prediction').toPandas() print('accuracy score: {}%'.format(round(metrics.accuracy_score(actual, predicted),3)*100))
- Calculate the ROC score using the following script:
from pyspark.ml.evaluation import BinaryClassificationEvaluator scores = predictionDF.select('label', 'rawPrediction') evaluator = BinaryClassificationEvaluator() print('The ROC score is {}%'.format(round(evaluator...