In the previous recipe, we looked at deriving precomputed views of our data taking some immutable data as the source. In that recipe, we used statically created data. In an operational system, we need Storm to store the immutable data into Hadoop so that it can be used in any preprocessing that is required.
As each tuple is processed in Storm, we must generate an Avro record based on the document record definition and append it to the data file within the Hadoop filesystem.
We must create a Trident function that takes each document tuple and stores the associated Avro record.
Within the
tfidf-topology
project created in Chapter 3, Calculating Term Importance with Trident, inside thestorm.cookbook.tfidf.function
package, create a new class namedPersistDocumentFunction
that extendsBaseFunction
. Within theprepare
function, initialize the Avro schema and document writer:public void prepare(Map conf, TridentOperationContext context) { try {...