Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Stream processing with Samza


To explore a pure stream-processing platform, we will use Samza, which is available at https://samza.apache.org. The code shown here was tested with the current 0.8 release and we'll keep the GitHub repository updated as the project continues to evolve.

Samza was built at LinkedIn and donated to the Apache Software Foundation in September 2013. Over the years, LinkedIn has built a model that conceptualizes much of their data as streams, and from this they saw the need for a framework that can provide a developer-friendly mechanism to process these ubiquitous data streams.

The team at LinkedIn realized that when it came to data processing, much of the attention went to the extreme ends of the spectrum, for example, RPC workloads are usually implemented as synchronous systems with very low latency requirements or batch systems where the periodicity of jobs is often measured in hours. The ground in between has been relatively poorly supported and this is the area...