Spark SQL is generally used in two different ways. The first way is to use it as a library to write SQL, Hive QL, DSL, or to write queries in languages such as Java, Scala, Python, or R. The second way is to use it as a distributed SQL engine in which clients connect to a Thrift server and submit SQL or Hive QL queries using JDBC or ODBC interfaces. It is really useful for data warehousing users to write and execute queries from Business Intelligence (BI) tools interactively. So, Spark SQL can be used for data warehousing solutions as well a distributed SQL query engine.
Spark SQL's thrift server provides JDBC access to Spark SQL.
The Thrift JDBC server corresponds to HiveServer2 in Hive. You can test the JDBC server with the beeline client or any SQL client. From Spark 1.6, by default, the Thrift server runs in multi-session mode.
For a complete list of options for starting thriftserver
, use the following command...