SparkSQL CLI
Spark provides SparkSQL CLI to work with the Hive metastore service in local mode and execute queries input from the command line.
You can start the Spark-SQL CLI as follows:
./bin/spark-sql
Configuration of Hive is done by placing your hive-site.xml
, core-site.xml
, and hdfs-site.xml
files in conf/. You may run ./bin/spark-sql --help
for a complete list of all available options.
Working with other databases
We have seen how you can work with Hive, which is fast becoming a defacto data warehouse option in the open source community. However, most of the data in the enterprises beginning with Hadoop or Spark journey is to stored in traditional databases including Oracle, Teradata, Greenplum, and Netezza. Spark provides you with the option to access those data sources using JDBC, which returns results as DataFrames. For the sake of brevity, we'll only share the Scala example of connecting to a Teradata database. Please remember to copy your database's JDBC driver class to all nodes...