In this recipe, we explore how programming works with Dataset. We use the and functional programming to separate the cars (domain object) by their models.
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Use package instruction to provide the right path
package spark.ml.cookbook.chapter3
- Import the necessary packages for Spark context to get access to the cluster and
Log4j.Logger
to reduce the amount of output produced by Spark.
import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.{Dataset, SparkSession} import spark.ml.cookbook.{Car, mydatasetdata} import scala.collection.mutable import scala.collection.mutable.ListBuffer import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession
- Define a Scala case to contain our data for processing, and our car class will represent electric and hybrid cars.
case class Car(make: String...