Apache Flume: Distributed Log Collection for Hadoop

If you are writing a Java program that creates data, you may choose to send the data directly as structured data using a special mode of Flume called the Embedded Agent. It is basically a simple single source/single channel Flume agent that you run inside your JVM.

There are benefits and drawbacks to this approach. On the positive side, you don't need to monitor an additional process on your servers to relay data. The embedded channel also allows for the data producer to continue executing its code immediately after queuing the event to the channel. The SinkRunner thread handles taking events from the channel and sending them to the configured sinks. Even if you didn't use embedded Flume to perform this handoff from the calling thread, you would most likely use some kind of synchronized queue (such as BlockingQueue) to isolate the sending of the data from the main execution thread. Using Embedded Flume provides the same functionality without having to worry whether you...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

The embedded agent