RHadoop is a collection of three R packages for providing large data operations with an R environment. It was developed by Revolution Analytics, which is the leading commercial provider of software based on R. RHadoop is available with three main R packages: rhdfs
, rmr
, and rhbase
. Each of them offers different Hadoop features.
rhdfs
is an R interface for providing the HDFS usability from the R console. As Hadoop MapReduce programs write their output on HDFS, it is very easy to access them by calling therhdfs
methods. The R programmer can easily perform read and write operations on distributed data files. Basically,rhdfs
package calls the HDFS API in backend to operate data sources stored on HDFS.rmr
is an R interface for providing Hadoop MapReduce facility inside the R environment. So, the R programmer needs to just divide their application logic into the map and reduce phases and submit it with thermr
methods. After that,rmr
calls the Hadoop streaming MapReduce API...