Spark itself is written in a programming language called Scala and runs in a Java environment. However, you are not restricted to using Scala. Spark has several interfaces which are exposed through an API, which allows Spark programs to be written in these other languages:
- R
- Scala
- Java
- Python
- Clojure
We will be demonstrating some of the examples in this chapter using SparkR. SparkR is an R package that provides a frontend to use Apache Spark from R. This allows SparkR to allow data scientists to interactively run jobs from R on a cluster. One big advantage of using SparkR, for the traditional R programmer, is that it uses some of the techniques that they already know such as the concept of dataframes is also available within SparkR.