In previous chapters, we included dependencies by specifying them in a build.sbt
file, and relying on SBT to fetch them from the Maven Central repositories. For Apache Spark, downloading the source code or pre-built binaries explicitly is more common, since Spark ships with many command line scripts that greatly facilitate launching jobs and interacting with a cluster.
Head over to http://spark.apache.org/downloads.html and download Spark 1.5.2, choosing the "pre-built for Hadoop 2.6 or later" package. You can also build Spark from source if you need customizations, but we will stick to the pre-built version since it requires no configuration.
Clicking Download will download a tarball, which you can unpack with the following command:
$ tar xzf spark-1.5.2-bin-hadoop2.6.tgz
This will create a spark-1.5.2-bin-hadoop2.6
directory. To verify that Spark works correctly, navigate to spark-1.5.2-bin-hadoop2.6/bin
and launch the Spark shell using ./spark-shell
. This is just a Scala...