Now, let's explore the options to work with SparkR including shell, scripts, RStudio, and Zeppelin.
Note
All programs in this chapter are executed on CDH 5.8 VM. For other environments, file paths might change. But the concepts are the same in any environment.
The following steps will explain how to install and configure R, and the latest version of Spark:
As a first step, we need to install R on all machines in the cluster. The following exercises are tested on CDH 5.7 Quick start VM, which has the CentOS 6.5 operating system. We need to add the latest Extra Packages for Enterprise Linux (EPEL) repository to the VM, which enables you to install R. EPEL is a community-based repository project from the Fedora team, which provides add-on packages for Red Hat and CentOS. Use the following command to install R on the VM:
wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm sudo rpm -ivh epel-release-6-8.noarch...