Apache Flume: Distributed Log Collection for Hadoop

In Flume, a channel is the construct used between sources and sinks. It provides a buffer for your in-flight events after they are read from sources until they can be written to sinks in your data processing pipelines.

The primary types we'll cover here are a memory-backed/nondurable channel and a local-filesystem-backed/durable channel. Starting with Flume 1.5, an experimental hybrid memory and file channel called the Spillable Memory Channel is introduced. The durable file channel flushes all changes to disk before acknowledging the receipt of the event to the sender. This is considerably slower than using the nondurable memory channel, but it provides recoverability in the event of system or Flume agent restarts. Conversely, the memory channel is much faster, but failure results in data loss and it has much lower storage capacity when compared to the multiterabyte disks backing the file channel. This is why the Spillable Memory Channel was created. In theory, you get...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

Chapter 3. Channels