In this recipe, we will show you how to solve a classification problem with MLlib by building two models: the ubiquitous logistic regression and a slightly more sophisticated model, the SVM ( Support Vector Machine).
To execute this recipe, you need to have a working Spark environment. You would have already gone through the Creating an RDD for training recipe where we created training and testing datasets for estimating classification models.
No other prerequisites are required.
Just like with the linear regression, building a logistic regression starts with creating a LogisticRegressionWithSGD
object:
import pyspark.mllib.classification as cl income_model_lr = cl.LogisticRegressionWithSGD.train(final_data_income_train)