Apache Flume: Distributed Log Collection for Hadoop

In this chapter, we covered how to download the Flume binary distribution. We created a simple configuration file that included one source writing to one channel, feeding one sink. The source listened on a socket for network clients to connect to and to send it event data. These events were written to an in-memory channel and then fed to a Log4j sink to become the output. We then connected to our listening agent using the Linux netcat utility and sent some string events to our Flume agent's source. Finally, we verified that our Log4j-based sink wrote the events out.

In the next chapter, we'll take a detailed look at the two major channel types you'll most likely use in your data processing workflows: the memory channel and the file channel.

We will also take a look at a new experimental channel, introduced in Version 1.5 of Flume, called the Spillable Memory Channel, which attempts to be a hybrid of the other two.

For each type, we'll discuss all the configuration knobs available to...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

Summary