In order to remove single points of failures in your data processing pipeline, Flume has the ability to send events to different sinks using either load balancing or failover. In order to do this, we need to introduce a new concept called a sink group. A sink group is used to create a logical grouping of sinks. The behavior of this grouping is dictated by something called the sink processor, which determines how events are routed.
There is a default sink processor that contains a single sink which is used whenever you have a sink that isn't part of any sink group. Our
Hello, World! example in Chapter 2, A Quick Start Guide to Flume, used the default sink processor. No special configuration is required for single sinks.
In order for Flume to know about the sink groups, there is a new top-level agent property called
sinkgroups. Similar to sources, channels, and sinks, you prefix the property with the agent name:
Here, we have defined a sink group called