Apache Flume: Distributed Log Collection for Hadoop

Introduced in Flume 1.5, the Spillable Memory Channel is a channel that acts like a memory channel until it is full. At that point, it acts like a file channel that is configured with a much larger capacity than its memory counterpart but runs at the speed of your disks (which means orders of magnitude slower).

Note

The Spillable Memory Channel is still considered experimental. Use it at your own risk!

I have mixed feelings about this new channel type. On the surface, it seems like a good idea, but in practice, I can see problems. Specifically, having a variable channel speed that changes depending on how downstream entities in your data pipe behave makes for difficult capacity planning. As a memory channel is used under good conditions, this implies that the data contained in it can be lost. So why would I go through extra trouble to save some of it to the disk? The data is either very important for me to spool it to disk with a file-backed channel, or it's less important...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

Spillable Memory Channel

Note