Many of you may already be aware of Storm. However, I will introduce Storm very briefly to all who don't know about it.
Storm provides a real-time computation framework to stream data. So, stream is a core data abstraction of Storm. It is composed of an unbounded sequence of tuples. A single unit of the streaming data is known as a tuple in the Storm terminology.
The worker components of the Storm job are divided into spout and bolt. Spout is a source of streams. Bolt can consume multiple streams. It can perform any processing required and may emit new streams. You can interlink a number of spouts and bolts to create a topology. A topology is a top-level abstraction that you can submit to the Storm cluster for execution.
The following diagram shows a sample Storm topology that shows the stream flow from source to target:
Let's now write our Storm job that will listen to live streaming tweets and inject the fields we want into Elasticsearch. To start with,...