Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

Table of Contents (17 chapters)
YARN Essentials
About the Authors
About the Reviewers
Free Chapter
Need for YARN
YARN – Alternative Solutions

Apache Samza

Samza is an open source project from LinkedIn and is currently an incubation project at the Apache Software Foundation. Samza is a lightweight distributed stream-processing framework to do real-time processing of data. The version that is available for download from the Apache website is not the production version that LinkedIn uses.

Samza is made up of the following three layers:

  • A streaming layer

  • An execution layer

  • A processing layer

Samza provides out-of-the-box support for all the preceding three layers:

  • Streaming: This layer is supported by Kafka (another open source project from LinkedIn)

  • Execution: supported by YARN

  • Processing: supported by Samza API

The following three pieces fit together to form Samza:

The following architecture should be familiar to anyone who has used Hadoop:

Before going into each of these three layers indepth, it should be noted that Samza's support is not limited to these systems. Both Samza's execution and streaming layers are pluggable and allow developers...