MapReduce is a programming approach that allows systems to process large datasets in parallel.
The key concept is that of using two functions, Map
and Reduce
, that are combined to produce a desired result.
Its genesis can be found in functional programming and has been available in languages such as LISP for several decades. Google has been a driver for bringing it out of the functional programming paradigm into the OOP (Object Orientated Programming) world. Its contributions include publishing a seminal paper on the subject in 2004, and being granted a patent on the technology.
So how does MapReduce work? The Map function takes a data set and then operates on the data, returning another data set as an output. This output is then fed to the Reduce
function, which subsequently operates on the data set once again and returns a smaller data set as an output.
So let's look at an example of how the
Map
function operates. The pseudo code function CtoF
in the following code takes a list...