While training an SVM, the modeler has to take a number of decisions:
How to pre-process the data (transformation and scaling). The categorical variables should be converted to numeric ones by dummifying them. Also, scaling the numeric values is needed (either 0 to 1 or -1 to +1).
Which kernel to use (check using cross-validation if you are unable to visualize the data and/ or conclude on it).
What parameters to set for the SVM: penalty parameter and the kernel parameter (find using cross-validation or grid search)
If needed, you can use an entropy based feature selection to include only the important features in your model.
Scala:
scala> import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD} import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD} scala> import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics scala> import org.apache.spark.mllib.util.MLUtils...