In this section, we introduce the of preparing the data prior to applying Spark MLlib algorithms. Typically, we need to have two columns called label and features for using Spark MLlib classification algorithms. We will illustrate this with the following example described:
We import the required classes for this section:
scala> import org.apache.spark.ml.Pipeline scala> import org.apache.spark.ml.classification.{RandomForestClassificationModel, RandomForestClassifier} scala> import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator scala> import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer} scala> import org.apache.spark.ml.linalg.Vectors