Similar to the previous recipe, we use the DecisionTree()
class to train and predict an outcome using a regression tree model. To refresh all these models is a variation on CART (Classification and Regression Tree), which comes in two modes. In this recipe, we the regression API for the tree implementation in Spark.
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter10
- Import the necessary packages for the Spark context to get access to the cluster and
Log4j.Logger
to reduce the amount of output produced by Spark:
import org.apache.spark.mllib.evaluation.RegressionMetrics import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.tree.DecisionTree import org.apache.spark.mllib.tree.model.DecisionTreeModel...