Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


In this chapter, we discussed the problem that Flume is attempting to solve: getting data into your Hadoop cluster for data processing in an easily configured, reliable way. We also discussed the Flume agent and its logical components, including events, sources, channel selectors, channels, sink processors, and sinks. Finally, we briefly discussed Morphlines as a powerful new ETL (Extract, Transform, Load) library, starting with Version 1.4 of Flume.

The next chapter will cover these in more detail, specifically, the most commonly used implementations of each. Like all good open source projects, almost all of these components are extensible if the bundled ones don't do what you need them to do.