As mentioned in the definition of the non-transactional topology, Trident processes tuples in a batch, but this doesn't define what's in each batch. In the case of a transactional topology, a transactional spout guarantees what's in each batch. A transactional spout has the following characteristics:
Each batch is assigned a unique transactional ID (
txid
). In the case of failure, the entire batch is replayed. Hence, replays of the failed batch will contain the same set of tuples as the first time the batch was emitted. Thetxid
transactional ID of the failed batch remains the same as the first time.Tuples of one batch are not mixed with tuples of another batch. Hence, overlaps of tuples between batches are not allowed.
Let's consider the previous sample Trident topology example and see how we can write a transactional topology. Suppose the sample Trident topology computes the country
field's count and stores the counts in a key/value store (Memory Map, Cassandra, Memcached...