The rhdfs
package is an interface between Hadoop and R, which can call an HDFS API in the backend to operate HDFS. As a result, you can easily operate HDFS from the R console through the use of the rhdfs
package. In the following recipe, we will demonstrate how to use the rhdfs
function to manipulate HDFS.
To proceed with this recipe, you need to have completed the previous recipe by installing rhdfs
into R, and validate that you can initialize HDFS via the hdfs.init
function.
Perform the following steps to operate files stored on HDFS:
- Initialize the
rhdfs
package:
> Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")> Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop- streaming-2.5.0-cdh5.2.0.jar")> library(rhdfs)> hdfs.init ()
- You can then manipulate files stored on HDFS, as follows:
hdfs.put
: Copy a file from the local filesystem to HDFS:
> hdfs.put('word.txt', './')
hdfs.ls
: Read the list of directory from HDFS:
...