Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steve Hoffman, Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steve Hoffman, Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Monitoring performance metrics


Now that we have covered some options for process monitoring, how do you know whether your application is actually doing the work you think it is? On many occasions, I've seen a stuck syslog-ng process that appears to be running, but it just wasn't sending any data. I'm not picking on syslog-ng specifically; all software does this when conditions that are not designed for occur.

When talking about Flume data flows, you need to monitor the following:

  • Data entering sources is within expected rates

  • Data isn't overflowing your channels

  • Data is exiting sinks at expected rates

Flume has a pluggable monitoring framework, but as mentioned at the beginning of the chapter, it is still very much a work in progress. This does not mean you shouldn't use it, as that would be foolish. It means you'll want to prepare extra testing and integration time anytime you upgrade.

Note

While not covered in the Flume documentation, it is common to enable JMX in your Flume JVM (http://bit.ly...