In this recipe, you will learn how to integrate Hive with Apache Spark. Apache Spark is an open source cluster computing framework. It is used as a replacement of the MapReduce framework.
In this topic, we will cover the use of Hive and Apache Spark. You must have Apache Spark installed on your system before going further in the topic.
Once the Spark is installed, start the Spark master server by executing the following command:
./sbin/start-master.sh
Check whether the Spark master server has been started or not by issuing the URL mentioned later on the web browser:
http://<ip_address>:<port_number>
The exact URL is present at the following path:
/spark-1.6.0-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
Once the master server is started, start the
slave
service by executing the following command:./sbin/start-slave.sh <master-spark-URL>
Refresh...