Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steve Hoffman, Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steve Hoffman, Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The file channel


A file channel is a channel that stores events to the local filesystem of the agent. Though it's slower than the memory channel, it provides a durable storage path that can survive most issues and should be used in use cases where a gap in your data flow is undesirable.

This durability is provided by a combination of a Write Ahead Log (WAL) and one or more file storage directories. The WAL is used to track all input and output from the channel in an atomically safe way. This way, if the agent is restarted, the WAL can be replayed to make sure all the events that came into the channel (puts) have been written out (takes) before the stored data can be purged from the local filesystem.

Additionally, the file channel supports the encryption of data written to the filesystem if your data handling policy requires that all data on the disk (even temporarily) be encrypted. I won't cover this here, but should you need it, there is an example in the Flume User Guide (http://flume.apache...