-
Book Overview & Buying
-
Table Of Contents
Apache Spark for Data Science Cookbook
By :
In this recipe, we'll see how to run SQL queries over SparkR DataFrames and cache the datasets.
To step through this recipe, you will need a running Spark Cluster either in pseudo distributed mode or in one of the distributed modes, that is, standalone, YARN, or Mesos. Also, install RStudio. Please refer to the Installing R recipe for details on the installation of R and the Creating SparkR DataFrames recipe to get acquainted with the creation of DataFrames from a variety of data sources.
The following code shows how to apply SQL queries over SparkR data frames using Spark 1.6.0. As per Spark 2.0.2, the methods would remain same except that spark session is used instead of SQLContext:
people.json contains the following content:{"name":"Michael"} {"name":"Andy", "age":30} {"...
Change the font size
Change margin width
Change background colour