The SQL context is the starting point for working with columnar data in Apache Spark. It is created from the Spark context, and provides the means for loading and saving data files of different types, using DataFrames, and manipulating columnar data with SQL, among other things. It can be used for the following:
Executing SQL via the SQL method
Registering user-defined functions via the UDF method
Caching
Configuration
DataFrames
Data source access
DDL operations
I am sure that there are other areas, but you get the idea. The examples in this chapter are written in Scala, just because I prefer the language, but you can develop in Python and Java as well. As shown previously, the SQL context is created from the Spark context. Importing the SQL context implicitly allows you to implicitly convert RDDs into DataFrames:
val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._
For instance, using the previous implicits
call, allows you to import a CSV file...