-
Book Overview & Buying
-
Table Of Contents
Hadoop MapReduce v2 Cookbook - Second Edition: RAW - Second Edition
Frequency distribution is the number of hits received by each URL sorted in ascending order. We already calculated the number of hits for each URL in the earlier recipe. This recipe will sort that list based on the number of hits.
This recipe assumes that you have a working Hadoop installation. This recipe will use the results from the Performing GROUP BY using MapReduce recipe of this chapter. Follow this recipe if you have not done so already.
The following steps show how to calculate frequency distribution using MapReduce:
Run the MapReduce job using the following command. We assume that the data/hit-count-out path contains the output of the HitCountMapReduce computation of the previous recipe:
$ bin/hadoop jar hcb-c5-samples.jar \ chapter5.weblog.FrequencyDistributionMapReduce \ data/hit-count-out data/freq-dist-out
Read the results by running the following command:
$ hdfs dfs -cat data/freq-dist...
Change the font size
Change margin width
Change background colour