Now, let's see how you can execute a streaming job on an EMR cluster. You can refer to Chapter 6, Executing Hadoop Jobs on an Amazon EMR Cluster, to launch an EMR cluster from the AWS management console and you can refer to Chapter 8, Amazon EMR – Command-line Interface Client, to launch a cluster using the CLI client tool.
While you are launching the cluster in the Steps section, select Streaming program from the Add step drop-down selection, as shown in the following screenshot:
After that, click on Configure and add. This will bring up a pop-up box where you can define various parameters for your streaming job. You should have your mapper and reducer executables along with the input files to be present in S3. The following screenshot shows the various parameters:
After you have entered the required parameters, click on Add. Optionally, you can also enter a list of arguments (space-separated strings) to pass to the Hadoop streaming...