The stage is now set to apply the model to the dataframe.
This section will focus on applying a very common classification model called logistic regression, which will involve importing some of the following from Spark:
from pyspark.ml.feature import VectorAssembler from pyspark.ml.evaluation import BinaryClassificationEvaluator from pyspark.ml.classification import LogisticRegression
This section will walk through the steps of applying our model and evaluating the results.
- Execute the following script to lump all of the feature variables in the dataframe in a list called
features
:
features = df.columns[1:]
- Execute the following to import
VectorAssembler
and configure the fields that will be assigned to the feature vector by assigning theinputCols
andoutputCol
:
from pyspark.ml.feature import VectorAssembler feature_vectors = VectorAssembler( inputCols = features, outputCol = "features")