-
Book Overview & Buying
-
Table Of Contents
Hadoop MapReduce v2 Cookbook - Second Edition: RAW - Second Edition
Another interesting view of a dataset is a histogram. A histogram makes sense only under a continuous dimension (for example, accessed time and file size). It groups the number of occurrences of an event into several groups in the dimension. For example, in this recipe, if we take the accessed time as the dimension, then we will group the accessed time by the hour.
The following figure shows the execution summary of this computation. The Mapper emits the hour of the access as the key and 1 as the value. Then, each reduce function invocation receives all the occurrences of a certain hour of the day, and it calculates the total number of occurrences for that hour of the day.

This recipe assumes that you have a working Hadoop installation. Install gnuplot.
The following steps show how to calculate and plot a histogram:
Download the weblog dataset from ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz and extract it.
Upload the...
Change the font size
Change margin width
Change background colour