Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Sink groups


In order to remove single points of failures in your data processing pipeline, Flume has the ability to send events to different sinks using either load balancing or failover. In order to do this, we need to introduce a new concept called a sink group. A sink group is used to create a logical grouping of sinks. The behavior of this grouping is dictated by something called the sink processor, which determines how events are routed.

There is a default sink processor that contains a single sink which is used whenever you have a sink that isn't part of any sink group. Our Hello, World! example in Chapter 2, A Quick Start Guide to Flume, used the default sink processor. No special configuration is required for single sinks.

In order for Flume to know about the sink groups, there is a new top-level agent property called sinkgroups. Similar to sources, channels, and sinks, you prefix the property with the agent name:

agent.sinkgroups=sg1

Here, we have defined a sink group called sg1 for...