Book Image

Learning Storm

By : Ankit Jain, Anand Nalya
Book Image

Learning Storm

By: Ankit Jain, Anand Nalya

Overview of this book

<p>Starting with the very basics of Storm, you will learn how to set up Storm on a single machine and move on to deploying Storm on your cluster. You will understand how Kafka can be integrated with Storm using the Kafka spout.</p> <p>You will then proceed to explore the Trident abstraction tool with Storm to perform stateful stream processing, guaranteeing single message processing in every topology. You will move ahead to learn how to integrate Hadoop with Storm. Next, you will learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, and Kafka to realize the full potential of Storm.</p> <p>Finally, you will perform in-depth case studies on Apache log processing and machine learning with a focus on Storm, and through these case studies, you will discover Storm's realm of possibilities.</p>
Table of Contents (16 chapters)
Learning Storm
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Trident aggregators


The Trident's aggregator is used to perform aggregation operations on an input batch or partition or stream. For example, let's say a user wants to count the number of tuples present in each batch, then he/she can use the count aggregator to count the number of tuples in each batch. The output of the Aggregator interface completely replaces the value of the input tuple. There are three types of aggregators available in Trident:

  • The partition aggregate

  • The aggregate

  • The persistence aggregate

Let's understand each type of aggregator in detail.

The partition aggregate

As the name suggests, the partition aggregate works on each partition instead of the entire batch. The output of the partition aggregate completely replaces the input tuple. Also, the output of the partition aggregate contains a single field tuple. The following is the piece of code that shows how we can use the partitionAggregate method:

mystream.partitionAggregate(new Fields("x"), new Count(), new Fields("count...