Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Storm


Apache Storm is a scalable, fault-tolerant, distributed, real-time computation system. Storm makes it easy to reliably process streams of data. Storm has many use cases: real-time analytics, online machine learning, continuous computation, ETL, and others. Storm can process over 1 million tuples per second per node. The following are the key features of Storm:

  • Real-time computation

  • Guarantees data will be processed

  • Scalable

  • Fault tolerant

    Note

    At the time this book was authored, Storm is a preview feature in Azure HDInsight.

Storm positioning in Data Lake

Hadoop and MapReduce provide a great batch processing capability. HBase provides the low latency store. Storm provides low latency transformation so that real-time processing can be performed on the raw data.

Let's consider our airline on-time performance use case. In the previous chapters, we saw how to ingest, transform, and analyze historical data using batch processing. With Storm, we can now process real-time feeds and analyze both historical...