The Trident aggregator is used to perform the aggregation operation on the input batch, partition, or input stream. For example, if a user wants to count the number of tuples present in each batch, then we can use the count aggregator to count the number of tuples in each batch. The output of the aggregator completely replaces the value of the input tuple. There are three types of aggregator available in Trident:
partitionAggregate
aggregate
persistenceAggregate
Let's understand each type of aggregator in detail.
As the name suggests, the partitionAggregate
works on each partition instead of the whole batch. The output of partitionAggregate
completely replaces the input tuple. Also, the output of partitionAggregate
contains a single-field tuple. Here is a piece of code that shows how we can use partitionAggregate
:
mystream.partitionAggregate(new Fields("x"), new Count() ,new new Fields("count"))
For example, we get an input stream containing the fields x
and...