In this recipe, we explore multilabel classification MultilabelMetrics
in Spark 2.0 which should not be mixed up with the previous recipe dealing with multiclass classification MulticlassMetrics
. The key to this recipe is to on evaluation metrics such as Hamming loss, accuracy, f1-measure, and so on, and what they measure.
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for SparkContext to get access to the cluster:
import org.apache.spark.sql.SparkSession import org.apache.spark.mllib.evaluation.MultilabelMetrics import org.apache.spark.rdd.RDD
- Create Spark's configuration and SparkContext:
val spark = SparkSession .builder .master("local[*]") .appName("myMultilabel") .config("spark.sql.warehouse.dir", ".") .getOrCreate()
- We create the...