Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


In this chapter, we covered various interceptors shipped with Flume, including:

  • Timestamp: These are used to add a timestamp header, possibly overwriting an existing one.

  • Host: This is used to add the Flume agent hostname or IP as a header in the event.

  • Static: This is used to add static String headers.

  • Regular expression filtering: This is used to include or exclude events based on a matched regular expression.

  • Regular expression extractor: This is used to create headers from matched regular expressions. It's useful for routing with Channel Selectors.

  • Morphline: This is used to delegate transformation to a Morphline command chain.

  • Custom: This is used to create any custom transformations you need that you can't find elsewhere.

We also covered tiering data flows using the Avro source and sink. Optional compression and SSL with Avro flows were covered as well. Finally, Thrift sources and sinks were briefly covered, as some environments may already have Thrift data flows to integrate with...