Apache Flume: Distributed Log Collection for Hadoop

Timestamp: These are used to add a timestamp header, possibly overwriting an existing one.
Host: This is used to add the Flume agent hostname or IP as a header in the event.
Static: This is used to add static String headers.
Regular expression filtering: This is used to include or exclude events based on a matched regular expression.
Regular expression extractor: This is used to create headers from matched regular expressions. It's useful for routing with Channel Selectors.
Morphline: This is used to delegate transformation to a Morphline command chain.
Custom: This is used to create any custom transformations you need that you can't find elsewhere.

We also covered tiering data flows using the Avro source and sink. Optional compression and SSL with Avro flows were covered as well. Finally, Thrift sources and sinks were briefly covered, as some environments may already have Thrift data flows to integrate with...

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Related Content you might be interested in

Current Title:

Apache Flume: Distributed Log Collection for Hadoop

Summary