By performing repartitioning operations, a user can partition tuples across multiple tasks. A repartitioning operation doesn't make any changes to the content of tuples. Also, the tuples will only pass over the network in the case of a repartitioning operation. The different types of repartitioning operations are explained in this section.
The
shuffle
repartitioning operation partitions the tuples in a uniform, random way across multiple tasks. This repartitioning operation is generally used when we want to distribute our processing load uniformly across tasks. The following diagram shows how the input tuples are repartitioned using the shuffle
operation:
The following piece of code shows how we can use the shuffle
operation:
mystream.shuffle().each(new Fields("a","b"), new myFilter()).parallelismHint(2)