Bivariate analysis finds out the relationship between two variables. In this, we always look for association and disassociation between variables at a predefined significance level. This analysis could be performed for any combination of categorical and continuous variables. The various combinations can be: both the variables categorical, categorical and continuous, and continuous and continuous.
To step through this recipe, you will need a running Spark cluster in any one of the modes, that is, local, standalone, YARN, or Mesos. For installing Spark on a standalone cluster, please refer to http://spark.apache.org/docs/latest/spark-standalone.html. Also, include the Spark MLlib package in the build.sbt
file so that it downloads the related libraries and the API can be used. Install Hadoop (optionally), Scala, and Java.