Apache Flume: Distributed Log Collection for Hadoop

The memory channel offers speed at the cost of data loss in the event of failure. Alternatively, the file channel provides a more reliable transport in that it can tolerate agent failures and restarts at a performance cost.

You will need to decide which channel is appropriate for your use cases. When trying to decide whether a memory channel is appropriate, ask yourself what the monetary cost is if you lose some data. Weigh that against the additional costs of more hardware to cover the difference in performance when deciding if you need a durable channel after all. Another consideration is whether or not the data can be resent. Not all data you might ingest into Hadoop will come from streaming application logs. If you receive "daily downloads" of data, you can get away with using a memory channel because if you encounter a problem, you can always rerun the import.

Finally...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

Summary