Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Comparing Samza and Spark Streaming


It is useful to compare Samza and Spark Streaming to help identify the areas in which each can best be applied. As it has been hopefully made clear in this book, these technologies are very much complimentary. Even though Spark Streaming might appear competitive with Samza, we feel both products offer compelling advantages in certain areas.

Samza shines when the input data is truly a stream of discrete events and you wish to build processing that operates on this type of input. Samza jobs running on Kafka can have latencies in the order of milliseconds. This provides a programming model focused on the individual messages and is the better fit for true near real-time processing applications. Though it lacks support to build topologies of collaborating jobs, its simple model allows similar constructs to be built and, perhaps more importantly, be easily reasoned about. Its model of partitioning and scaling also focuses on simplicity, which again makes a Samza...