In this chapter we are entering the diverse world of Big Data tools and applications that can be relatively easily integrated with the R language. In this chapter, we will present you with a set of guidelines and tips on the following topics:
Deploying cloud-based virtual machines with Hadoop, the ready-to-use Hadoop Distributed File System (HDFS), and MapReduce frameworks
Configuring your instance/virtual machine to include essential libraries and useful supplementary tools for data management in HDFS
Managing HDFS using shell/Terminal commands and running a simple MapReduce word count in Java for comparison
Integrating R statistical environment with Hadoop on a single-node cluster
Managing files in HDFS and run simple MapReduce jobs using the
rhadoop
bundle of R packagesCarrying out more complex MapReduce tasks on large-scale electricity meter readings datasets on a multi-node HDInsight cluster on Microsoft Azure
However, just before we dive into...