Spark can process data from various data sources, such as HDFS, Cassandra, and relational databases. Big data frameworks (unlike relational database systems) do not enforce a schema while writing data into it. HDFS is a perfect example of where any arbitrary file is welcome during the write phase. The same is true with Amazon S3. Reading data is a different story, however. You need to give some structure to even completely unstructured data to make sense out of it. With this structured data, SQL comes in very handy, when it comes to making sense out of some data.
Spark SQL is a component of the Spark ecosystem, introduced in Spark 1.0 for the first time. It incorporates a project named Shark, which was an attempt to make Hive run on Spark.
Hive is essentially a relational abstraction; it converts SQL queries into MapReduce jobs. See the following figure:
Shark replaced the MapReduce part with Spark while retaining most of the code...