Although Spark can be deployed in single-node, standalone mode, its powerful capabilities are best fit for multi-node applications. With this in mind, we will dedicate most of this chapter to practical Big Data crunching with Spark and R on a Microsoft Azure HDInsight cluster. As you should already be familiar with the deployment process of HDInsight clusters, our Spark workflows will contain one additional twist—€”the Spark framework will process the data straight from the **Hive** database, which will be populated with tables from HDFS. The introduction of Hive is a useful extension of the concepts covered in Chapter 5, *R with Relational Database Management Systems (RDBMSs)* and Chapter 6, *R with Non-Relational (NoSQL) Databases*, where we discussed the connectivity of R with relational and non-relational databases. But before we can use it, we should firstly launch a new HDInsight cluster with Spark and RStudio.

