Book Image

Apache Flume: Distributed Log Collection for Hadoop

By : Steven Hoffman
Book Image

Apache Flume: Distributed Log Collection for Hadoop

By: Steven Hoffman

Overview of this book

Table of Contents (16 chapters)
Apache Flume: Distributed Log Collection for Hadoop Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Time zones are evil


In case you missed my bias against using local time in Chapter 4, Sinks and Sink Processors, I'll repeat it here a little stronger: time zones are evil—evil like Dr. Evil (http://en.wikipedia.org/wiki/Dr._Evil)—and let's not forget about his Mini Me counterpart, (http://en.wikipedia.org/wiki/Mini-Me)—Daylight Savings Time.

We live in a global world now. You are pulling data from all over the place into your Hadoop cluster. You may even have multiple data centers in different parts of the country (or the world). The last thing you want to be doing while trying to analyze your data is to deal with askew data. Daylight Savings Time changes at least somewhere on Earth a dozen times in a year. Just look at the history: ftp://ftp.iana.org/tz/releases/. Save yourself the headache and just normalize it to UTC. If you want to convert it to "local time" on its way to human eyeballs, feel free. However, while it lives in your cluster, keep it normalized to UTC.

Note

Consider adopting...