Book Image

Learning Cascading

Book Image

Learning Cascading

Overview of this book

Table of Contents (18 chapters)
Learning Cascading
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
7
Optimizing the Performance of a Cascading Application
Index

Understanding common Cascading themes


Cascading is an application development and a data processing framework that makes Hadoop development simpler and more durable over time. It does this by reducing the number of lines of code that are required to perform most tasks, and durability comes through its seamless support of Hadoop and other big data frameworks.

Hadoop application development with MapReduce and other frameworks is a difficult task with a very steep learning curve, as you have seen in the previous chapter. Cascading significantly simplifies this process with an intuitive high-level abstraction over Hadoop application development, job creation, and job processing. With Cascading, developers can create reusable components and pipe assemblies, which are reliable and scalable.

Data flows as processes

Imagine a plumbing system, a water filtration system, or a brewery—anything that uses a stream of some kind to be processed and is ultimately used. The system usually starts with some source...