Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Syslog sources


Syslog has been around for decades and is often used as an operating-system-level mechanism to capture and move logs around systems. In many ways, there are overlaps with some of the functionality Flume provides. There is even a Hadoop module for rsyslog, one of the more modern variants of syslog (http://www.rsyslog.com/doc/rsyslog_conf_modules.html/omhdfs.html). Generally, I don't like solutions that couple technologies that may version independently. If you use this rsyslog/Hadoop integration, you would be required to update the version of Hadoop you compiled into rsyslog at the same time you upgraded your Hadoop cluster to a new major version. This may be logistically difficult if you have a large number of servers and/or environments. Backward compatibility in Hadoop wire protocols is something that is being actively worked on in the Hadoop community, but currently, it isn't the norm. We'll talk more about this in Chapter 8, Monitoring Flume, when we discuss tiering data...