Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Calculating frequency distributions and sorting using MapReduce


Frequency distribution is the number of hits received by each URL sorted in ascending order. We already calculated the number of hits for each URL in the earlier recipe. This recipe will sort that list based on the number of hits.

Getting ready

This recipe assumes that you have a working Hadoop installation. This recipe will use the results from the Performing GROUP BY using MapReduce recipe of this chapter. Follow this recipe if you have not done so already.

How to do it...

The following steps show how to calculate frequency distribution using MapReduce:

  1. Run the MapReduce job using the following command. We assume that the data/hit-count-out path contains the output of the HitCountMapReduce computation of the previous recipe:

    $ bin/hadoop jar hcb-c5-samples.jar \
    chapter5.weblog.FrequencyDistributionMapReduce \
    data/hit-count-out data/freq-dist-out
    
  2. Read the results by running the following command:

    $ hdfs dfs -cat data/freq-dist...