Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
About the Authors
About the Reviewers


Hopefully, this chapter presented the topic of data life cycle management as something other than a dry abstract concept. We covered a lot, particularly:

  • The definition of data life cycle management and how it covers a number of issues and techniques that usually become important with large data volumes

  • The concept of building a data ingest pipeline along good data life cycle management principles that can then be utilized by higher-level analytic tools

  • Oozie as a Hadoop-focused workflow manager and how we can use it to compose a series of actions into a unified workflow

  • Various Oozie tools, such as subworkflows, parallel action execution, and global variables, that allow us to apply true design principles to our workflows

  • HCatalog and how it provides the means for tools other than Hive to read and write table-structured data; we showed its great promise and integration with tools such as Pig but also highlighted some current weaknesses

  • Avro as our tool of choice to handle schema evolution...