Apache Flume: Distributed Log Collection for Hadoop

In this chapter, we covered in depth the various sources that we can use to insert log data into Flume, including the Exec source, the Spooling Directory Source, Syslog sources (UDP, TCP, and multiport TCP), and the JMS source.

We discussed replicating the old TailSource functionality in Flume 0.9 and problems with using tail semantics in general.

We also covered channel selectors and sending events to one or more channels, specifically the replicating and multiplexing channel selectors.

Optional channels were also discussed as a way to only fail a put transaction for only some of the channels when more than one channel is used.

In the next chapter, we'll introduce interceptors that will allow in-flight inspection and transformation of events. Used in conjunction with channel selectors, interceptors provide the final piece to create complex data flows with Flume. Additionally, we will cover RPC mechanisms (source/sink pairs) between Flume agents using both Avro and Thrift, which can...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

Summary