For this example, we will again use Trident and build on the topology that we constructed in the previous chapter. The Trident topology is depicted as follows:
The TwitterSpout
performs the search against the Twitter API periodically, emitting the tweets that it returns into a Trident stream. The TweetSplitterFunction
then parses the tweets and emits a tuple for each word in the tweets. The WordFrequencyFunction
enriches each tuple with the count for that word from a random sample of the English language. Finally, we let Druid consume that information to perform the aggregations over time. Druid partitions the data into temporal slices and persists that data as described previously.
In this case, because the persistence mechanism is our means of addressing fault tolerance/system failure, the persistence mechanism should distribute storage and provide both consistency and high-availability. Additionally, Hadoop should be capable of using the persistence...