-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Haskell Data Analysis cookbook
By :
MapReduce is a framework for efficient parallel algorithms that take advantage of divide and conquer. If a task can be split into smaller tasks, and the results of each individual task can be combined to form the final answer, then MapReduce is likely the best framework for this job.
In the following figure, we can see that a large list is split up, and the mapper functions work in parallel on each split. After all the mapping is complete, the second phase of the framework kicks in, reducing the various calculations into one final answer.
In this recipe, we will be counting word frequencies in a large corpus of text. Given many files of words, we will apply the MapReduce framework to find the word frequencies in parallel.

Install the parallel package using cabal as follows:
$ cabal install parallel
Create multiple files with words. In this recipe, we download a huge text file and split it up using the UNIX split command as follows...
Change the font size
Change margin width
Change background colour