Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Calculating Scatter plots using MapReduce


Another useful tool while analyzing data is a Scatter plot, which can be used to find the relationship between two measurements (dimensions). It plots the two dimensions against each other.

For example, this recipe analyzes the data to find the relationship between the size of the web pages and the number of hits received by the web page.

The following image shows the execution summary of this computation. Here, the map function calculates and emits the message size (rounded to 1024 bytes) as the key and one as the value. Then, the Reducer calculates the number of occurrences for each message size:

Getting ready

This recipe assumes that you have a working Hadoop installation. Install gnuplot.

How to do it...

The following steps show how to use MapReduce to calculate the correlation between two datasets:

  1. Download the weblog dataset from ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz and extract it.

  2. Upload the extracted data to HDFS by running the following...