This chapter will present the implementation of a very well-known data processing algorithm, Term Frequency–Inverse Document Frequency (TF-IDF), using Storm's Trident API. TF-IDF is a numerical statistic that reflects how important a word is to a document within a collection of documents. This is often a key concern in search engines but is also an important starting point in sentiment mining, as the trend of the important words within textual content can be an extremely useful predictor or an analytical tool.
Tip
TF-IDF drives many search engines, such as Apache Lucence. If you want the details of how it is used in this context, please read the documentation for the Similarity
class in Apache Lucence at http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/search/Similarity.html.
According to the Storm project wiki (https://github.com/nathanmarz/storm/wiki/Trident-tutorial), Trident is a new high-level abstraction for doing real-time computing on top of Storm. It allows...