Simple Storage Service (S3) provides a convenient online data store. Users can use it to store and retrieve data. More information about S3 can be obtained from http://aws.amazon.com/s3/.
This recipe will outline steps to configure S3 as the distributed data storage system for MapReduce.
Before getting started, we assume that you have successfully registered with AWS and the client machine has been successfully configured to access the AWS.
Use the following steps to configure S3 for data storage:
Stop the Hadoop cluster using the following command:
stop-all.sh
Open the file
$HADOOP_HOME/conf/core-site.xml
and add the following contents into the file:<property> <name>fs.default.name</name> <!-- value>master:54310</value--> <value>s3n://packt-bucket</value> </property> <property> <name>fs.s3n.awsAccessKeyId</name> <value>AKIAJ7GAQT52MZKJA4WQ</value> </property>...