Samza is an open source project from LinkedIn and is currently an incubation project at the Apache Software Foundation. Samza is a lightweight distributed stream-processing framework to do real-time processing of data. The version that is available for download from the Apache website is not the production version that LinkedIn uses.
Samza is made up of the following three layers:
A streaming layer
An execution layer
A processing layer
Samza provides out-of-the-box support for all the preceding three layers:
The following three pieces fit together to form Samza:
The following architecture should be familiar to anyone who has used Hadoop:
Before going into each of these three layers indepth, it should be noted that Samza's support is not limited to these systems. Both Samza's execution and streaming layers are pluggable and allow developers...