We will get into the what and how of MapReduce in a bit, but first let's say you have a simple counting problem at hand. Say, you need to count a number of hits to your website per country or per city. The only hurdle you have in solving this is the sheer amount of input data you have in order to solve this problem. That is, your website is quite popular and you have huge amounts of access logs generated per day. Also, you need to create a system in place which would send a report on a daily basis to the top management showing the number of total views per country.
Had it been a few hundred MBs of access logs or even a few GBs, you could easily create a standalone application that would crunch these data and count the views per country in a few hours. But what to do when the input data is in hundreds of GBs?
The best way to handle this will be to create a processing system that can work on parts of the input data in parallel and ultimately combine all the results. This...