Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Injecting streaming data into Storm


Many of you may already be aware of Storm. However, I will introduce Storm very briefly to all who don't know about it.

Storm provides a real-time computation framework to stream data. So, stream is a core data abstraction of Storm. It is composed of an unbounded sequence of tuples. A single unit of the streaming data is known as a tuple in the Storm terminology.

The worker components of the Storm job are divided into spout and bolt. Spout is a source of streams. Bolt can consume multiple streams. It can perform any processing required and may emit new streams. You can interlink a number of spouts and bolts to create a topology. A topology is a top-level abstraction that you can submit to the Storm cluster for execution.

The following diagram shows a sample Storm topology that shows the stream flow from source to target:

Let's now write our Storm job that will listen to live streaming tweets and inject the fields we want into Elasticsearch. To start with,...