When defining a topology, we create a graph of computation with a number of bolt-processing streams. At a more granular level, each bolt executes as multiple tasks in the topology. A stream will be partitioned into a number of partitions and divided among the bolts' tasks. Thus, each task of a particular bolt will only get a subset of the tuples from the subscribed streams.
Stream grouping in Storm provides complete control over how this partitioning of tuples happens among many tasks of a bolt subscribed to a stream. Grouping for a bolt can be defined on the instance of the backtype.storm.topology.InputDeclarer
class returned when defining bolts using the backtype.storm.topology.TopologyBuilder.setBolt
method.
Storm supports the following types of stream groupings:
Shuffle grouping
Fields grouping
All grouping
Global grouping
Direct grouping
Local or shuffle grouping
Custom grouping
Now, we will look at each of these groupings in detail.