Before we can train an ML model, we need to create an RDD where each element is a labeled point. In this recipe, we will use the final_data
RDD we created in the previous recipe to prepare our RDD for training.
To execute this recipe, you need to have a working Spark environment. You would have already gone through the previous recipe when we standardized the encoded census data.
No other prerequisites are required.
Many of the MLlib models require an RDD of labeled points to train. The next code snippets will create such an RDD for us to build classification and regression model.