In this appendix, we are going to look at sample MapReduce programs to build Solr indexes. We will start with an example of a MapReduce program.
Let's say we have three files containing the following text, and we have to get a word count of each word:
[I enjoy walking on the beach sand. The Maya beach is what I enjoy most.]
[John loves to play volleyball on the beach.]
[We enjoy watching television.]
The results are then split into blocks and replicated on multiple data nodes. The map function then extracts a count of words from each file. The following <key, value> pairs are outcomes of the map function of Hadoop:
<I ,2> <enjoy, 2> <walking,1> <on,1> <the,2> <beach,2> <sand,1> <maya,1> <is,1> <what,1> <most,1>
<John,1> <loves,1> <to,1> <play,1> <volleyball,1> <on,1> <the,1> <beach,1>
<we,1> <enjoy,1> <watching,1> <television,1>
Now, reduce task merges all these together and reduces the input to a single set of <key, value> pairs, getting us the count of words:
<I ,2> <enjoy, 3> <walking,1> <on,2> <the,3> <beach,3> <sand,1> <maya,1> <is,1> <what,1> <most,1> <John,1> <loves,1> <to,1> <play,1> <volleyball,1> <we,1> <watching,1> <television,1>
Now, we will look at some samples for different implementations.