The final step to complete the big data architecture is largely complete already and is surprisingly simple, as is the case with all good functional style designs.
This recipe involves simply extending the existing TF-IDF DRPC query that we defined in Chapter 4, Distributed Remote Procedure Calls. We need three new state sources that represents the D, DF, and TF values computed in the Batch layer. We will combine the values from these states with the existing state before performing the final TF-IDF calculation.
Start from the inside out by creating the combination function called
BatchCombiner
within thestorm.cookbook.tfidf.function
package and implement the logic to combine two versions of the same state. One version should be from the current hour, and the other from all the data prior to the current hour:public void execute(TridentTuple tuple, TridentCollector collector) { try { double d_rt = (double) tuple.getLongByField...