Book Image

Learning Storm

By : Ankit Jain, Anand Nalya
Book Image

Learning Storm

By: Ankit Jain, Anand Nalya

Overview of this book

<p>Starting with the very basics of Storm, you will learn how to set up Storm on a single machine and move on to deploying Storm on your cluster. You will understand how Kafka can be integrated with Storm using the Kafka spout.</p> <p>You will then proceed to explore the Trident abstraction tool with Storm to perform stateful stream processing, guaranteeing single message processing in every topology. You will move ahead to learn how to integrate Hadoop with Storm. Next, you will learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, and Kafka to realize the full potential of Storm.</p> <p>Finally, you will perform in-depth case studies on Apache log processing and machine learning with a focus on Storm, and through these case studies, you will discover Storm's realm of possibilities.</p>
Table of Contents (16 chapters)
Learning Storm
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Storm components


A Storm cluster follows a master-slave model where the master and slave processes are coordinated through ZooKeeper. The following are the components of a Storm cluster.

Nimbus

The Nimbus node is the master in a Storm cluster. It is responsible for distributing the application code across various worker nodes, assigning tasks to different machines, monitoring tasks for any failures, and restarting them as and when required.

Nimbus is stateless and stores all of its data in ZooKeeper. There is a single Nimbus node in a Storm cluster. It is designed to be fail-fast, so when Nimbus dies, it can be restarted without having any effects on the already running tasks on the worker nodes. This is unlike Hadoop, where if the JobTracker dies, all the running jobs are left in an inconsistent state and need to be executed again.

Supervisor nodes

Supervisor nodes are the worker nodes in a Storm cluster. Each supervisor node runs a supervisor daemon that is responsible for creating, starting, and stopping worker processes to execute the tasks assigned to that node. Like Nimbus, a supervisor daemon is also fail-fast and stores all of its state in ZooKeeper so that it can be restarted without any state loss. A single supervisor daemon normally handles multiple worker processes running on that machine.

The ZooKeeper cluster

In any distributed application, various processes need to coordinate with each other and share some configuration information. ZooKeeper is an application that provides all these services in a reliable manner. Being a distributed application, Storm also uses a ZooKeeper cluster to coordinate various processes. All of the states associated with the cluster and the various tasks submitted to the Storm are stored in ZooKeeper. Nimbus and supervisor nodes do not communicate directly with each other but through ZooKeeper. As all data is stored in ZooKeeper, both Nimbus and the supervisor daemons can be killed abruptly without adversely affecting the cluster.

The following is an architecture diagram of a Storm cluster:

A Storm cluster's architecture